mar10 / pyftpsync

Synchronize directories using FTP(S), SFTP, or file system access.
https://pyftpsync.readthedocs.io
MIT License
117 stars 25 forks source link

Utf-8 encoding problems #30

Closed TorokLev closed 5 years ago

TorokLev commented 6 years ago

Hi, It looks like that there is a utf-8 encoding problem in writing/reading to json file as:

                  to ftp://192.168.43.230/tmp2
COPY NEW          >  01 - Születés.mp3          
SKIP DOWNLOAD     <  01 - Születés_.mp3          
Traceback (most recent call last):
  File "/usr/local/bin/pyftpsync", line 11, in <module>
    sys.exit(run())
  File "/usr/local/lib/python2.7/dist-packages/ftpsync/pyftpsync.py", line 174, in run
    s.run()
  File "/usr/local/lib/python2.7/dist-packages/ftpsync/synchronizers.py", line 972, in run
    res = super(UploadSynchronizer, self).run()
  File "/usr/local/lib/python2.7/dist-packages/ftpsync/synchronizers.py", line 723, in run
    res = super(BiDirSynchronizer, self).run()
  File "/usr/local/lib/python2.7/dist-packages/ftpsync/synchronizers.py", line 189, in run
    res = self._sync_dir()
  File "/usr/local/lib/python2.7/dist-packages/ftpsync/synchronizers.py", line 519, in _sync_dir
    self.local.flush_meta()
  File "/usr/local/lib/python2.7/dist-packages/ftpsync/targets.py", line 321, in flush_meta
    self.cur_dir_meta.flush()
  File "/usr/local/lib/python2.7/dist-packages/ftpsync/metadata.py", line 180, in flush
    s = json.dumps(self.dir, sort_keys=True)
  File "/usr/lib/python2.7/json/__init__.py", line 251, in dumps
    sort_keys=sort_keys, **kw).encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 209, in encode
    chunks = list(chunks)
  File "/usr/lib/python2.7/json/encoder.py", line 434, in _iterencode
    for chunk in _iterencode_dict(o, _current_indent_level):
  File "/usr/lib/python2.7/json/encoder.py", line 408, in _iterencode_dict
    for chunk in chunks:
  File "/usr/lib/python2.7/json/encoder.py", line 408, in _iterencode_dict
    for chunk in chunks:
  File "/usr/lib/python2.7/json/encoder.py", line 361, in _iterencode_dict
    items = sorted(dct.items(), key=lambda kv: kv[0])
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128)

the corresponding json looks like:

> {
>     "_disclaimer": "Generated by https://github.com/mar10/pyftpsync",
>     "_file_version": 2,
>     "_time": 1529133757.0,
>     "_time_str": "2018-06-16 10:22:37",
>     "_version": "2.0.0",
>     "mtimes": {},
>     "peer_sync": {
>         "192.168.43.230/tmp2": {
>             "01 - Sz\u00fclet\u00e9s.mp3": {
>                 "m": 1529137337.656211,
>                 "s": 13675540,
>                 "u": 1529137357.673848
>             },
>             ":last_sync": 1529137357.673848
>         }
>     }
> }
> 

Thanks,

mar10 commented 6 years ago

Could you add more info, like Version, OS version, command line, and logging output (with -vv)?

TorokLev commented 6 years ago

OS: ubuntu 16.04

lev@yellow:~/00$ uname -a
Linux yellow 4.4.0-121-generic #145-Ubuntu SMP Fri Apr 13 13:47:23 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
lev@yellow:~/00$ pyftpsync -V
/usr/local/lib/python2.7/dist-packages/keyring/backends/Gnome.py:6: PyGIWarning: GnomeKeyring was imported without specifying a version first. Use gi.require_version('GnomeKeyring', '1.0') before import to ensure that the right version gets loaded.
  from gi.repository import GnomeKeyring
2.0.0
pip packages.. ``` lev@yellow:~/00$ pip freeze absl-py==0.2.2 adium-theme-ubuntu==0.3.4 alabaster==0.7.9 appdirs==1.4.0 apsw==3.8.11.1.post1 astor==0.6.2 attrs==15.2.0 Authomatic==0.1.0.post1 Babel==2.3.4 backports-abc==0.4 backports.ssl-match-hostname==3.5.0.1 backports.weakref==1.0.post1 BeautifulSoup==3.2.1 beautifulsoup4==4.4.1 bleach==1.5.0 bottle==0.12.13 brewer2mpl==1.4.1 certifi==2015.11.20.1 cffi==1.9.1 chardet==2.3.0 CherryPy==3.5.0 cntk==2.5.1 colorama==0.3.9 costcla==0.5 coverage==4.2 coveralls==1.1 cronen==1.1 cryptography==1.2.3 cssselect==1.0.1 cssutils==1.0 cycler==0.9.0 Cython==0.23.4 daft==0.0.4 decorator==4.0.6 defer==1.0.6 dnspython==1.12.0 docopt==0.6.2 docutils==0.13.1 enum34==1.1.6 feedparser==5.1.3 Flask==0.10.1 Flask-OpenID==1.2.5 fluentmock==0.3.2 FormEncode==1.3.1 ftpsync==1.2.2 funcsigs==1.0.2 functools32==3.2.3.post2 future==0.16.0 futures==3.2.0 gast==0.2.0 get==0.0.0 ggplot==0.11.5 gmusicapi==7.0.0 gpsoauth==0.0.4 grpcio==1.12.1 h2o==3.10.0.10 h5py==2.8.0 html5lib==0.9999999 httplib2==0.9.2 idna==2.2 imageio==1.3 imagesize==0.7.1 imbalanced-learn==0.3.3 imblearn==0.0 IMDbPY==5.0 iotop==0.6 ipaddress==1.0.18 ipykernel==4.2.2 ipython==4.0.3 ipython-genutils==0.1.0 itsdangerous==0.24 Jinja2==2.8 jsonschema==2.5.1 jupyter-client==4.1.1 jupyter-core==4.0.6 Keras==2.1.6 kerberos==1.1.1 keyring==5.3 libpgm==1.3 llvmlite==0.17.0.dev0+7.gede38f9 lxml==3.5.0 Markdown==2.6.11 MarkupSafe==0.23 matplotlib==1.5.1 MechanicalSoup==0.4.0 mechanize==0.2.5 meld==3.14.2 mistune==0.7.1 mock==2.0.0 mutagen==1.31 nbconvert==4.1.0 nbformat==4.0.1 ndg-httpsclient==0.4.0 netifaces==0.10.4 networkx==1.10 nose==1.3.6 notebook==4.1.0 numba==0.32.0.dev0+6.g0d73ad6 numpy==1.13.3 oauth2client==1.5.2 opencv-python==3.2.0.7 PAM==0.4.2 pandas==0.23.0 pandoc==1.0.2 parsel==1.1.0 path.py==8.1.2 patsy==0.5.0 pbr==1.10.0 pexpect==4.0.1 pgmpy==0.1.2 pickleshare==0.6 Pillow==3.1.0 Pivy==0.5.0 ply==3.9 post==0.0.0 proboscis==1.2.6.0 protobuf==3.5.2.post1 psutil==3.4.2 psycopg2==2.6 ptyprocess==0.5 public==0.0.0 py==1.4.28 pyasn1==0.1.9 pyasn1-modules==0.0.8 pycparser==2.17 pycrypto==2.6.1 pycurl==7.43.0 PyDispatcher==2.0.5 pyDOE==0.3.7 pydot==1.0.2 pyea==0.2 pyFFTW==0.9.2 pyftpsync==2.0.0 pygame===1.9.1release Pygments==2.1 pygobject==3.20.0 pygraphviz==1.2 pyhs2==0.6.0 pylab==0.1.3 pymc==2.3.6 pyOpenSSL==0.15.1 pyparsing==2.0.7 PySDL2==0.9.3 pyserial==3.0.1 pystan==2.9.0.0 Pyste==0.9.10 pytest==2.7.1 python-apt==1.1.0b1+ubuntu0.16.4.1 python-dateutil==2.7.3 python-debian==0.1.27 python-libdiscid==0.4.1 python-libtorrent==1.0.7 python-memcached==1.54 python-openid==2.2.5 pytz==2015.7 pyxdg==0.25 PyYAML==3.11 pyzmq==15.2.0 query-string==0.0.0 queuelib==1.4.2 repoze.lru==0.6 request==0.0.0 requests==2.9.1 requests-kerberos==0.7.0 Routes==2.2 rsa==3.2.3 sasl==0.1.3 scikit-image==0.11.3 scikit-learn==0.19.0 scipy==0.19.1 seaborn==0.7.0 selenium==3.0.2 service-identity==16.0.0 setupfiles==0.0.0 simplegeneric==0.8.1 singledispatch==3.4.0.3 six==1.11.0 sklearn==0.0 snowballstemmer==1.2.1 Sphinx==1.5.1 SQLAlchemy==1.0.11 sqlalchemy-migrate==0.11.0 SQLObject==3.7.0 sqlparse==0.2.4 statsmodels==0.6.1 sympy==0.7.6.1 tabulate==0.7.7 Tempita==0.5.2 tensorboard==1.8.0 tensorflow==1.8.0 termcolor==1.1.0 terminado==0.6 thrift==0.9.2 tornado==4.3 traitlets==4.1.0 Twisted==16.0.0 unity-lens-photos==1.0 urlgrabber==3.9.1 uTidylib==0.2 validictory==1.0.1 w3lib==1.17.0 WebOb==1.5.1 Werkzeug==0.14.1 xlrd==0.9.4 xlwt==1.0.0 youtube-dl==2017.1.29 zope.interface==4.1.3 ```


Server side: Android WiFi FTP Server

And this is the sequence:

1)

lev@yellow:~/tmp2$ pyftpsync upload --delete /home/lev/tmp2 ftp://android:xxxxx@192.168.43.230:2222:/tmp2 /usr/local/lib/python2.7/dist-packages/keyring/backends/Gnome.py:6: PyGIWarning: GnomeKeyring was imported without specifying a version first. Use gi.require_version('GnomeKeyring', '1.0') before import to ensure that the right version gets loaded. from gi.repository import GnomeKeyring Upload /home/lev/tmp2 to ftp://192.168.43.230/tmp2 Traceback (most recent call last): File "/usr/local/bin/pyftpsync", line 11, in sys.exit(run()) File "/usr/local/lib/python2.7/dist-packages/ftpsync/pyftpsync.py", line 174, in run s.run() File "/usr/local/lib/python2.7/dist-packages/ftpsync/synchronizers.py", line 972, in run res = super(UploadSynchronizer, self).run() File "/usr/local/lib/python2.7/dist-packages/ftpsync/synchronizers.py", line 723, in run res = super(BiDirSynchronizer, self).run() File "/usr/local/lib/python2.7/dist-packages/ftpsync/synchronizers.py", line 187, in run self.remote.open() File "/usr/local/lib/python2.7/dist-packages/ftpsync/ftp_target.py", line 109, in open self.ftp.connect(self.host, self.port) File "/usr/lib/python2.7/ftplib.py", line 135, in connect self.sock = socket.create_connection((self.host, self.port), self.timeout) File "/usr/lib/python2.7/socket.py", line 575, in create_connection raise err socket.error: [Errno 111] Connection refused

2)

lev@yellow:~/tmp2$ pyftpsync upload --delete /home/lev/tmp2 ftp://android:xxxxx@192.168.43.230:2221:/tmp2

/usr/local/lib/python2.7/dist-packages/keyring/backends/Gnome.py:6: PyGIWarning: GnomeKeyring was imported without specifying a version first. Use gi.requireversion('GnomeKeyring', '1.0') before import to ensure that the right version gets loaded. from gi.repository import GnomeKeyring Upload /home/lev/tmp2 to ftp://192.168.43.230/tmp2 COPY EXISTING > 01 - Születés.mp3
SKIP DOWNLOAD < 01 - Születés
.mp3
Wrote 1/1 files in 0 directories, skipped: 0. Elap: 23.99 sec. lev@yellow:~/tmp2$ pyftpsync upload --delete /home/lev/tmp2 ftp://android:xxxxx@192.168.43.230:2221:/tmp2 /usr/local/lib/python2.7/dist-packages/keyring/backends/Gnome.py:6: PyGIWarning: GnomeKeyring was imported without specifying a version first. Use gi.requireversion('GnomeKeyring', '1.0') before import to ensure that the right version gets loaded. from gi.repository import GnomeKeyring Upload /home/lev/tmp2 to ftp://192.168.43.230/tmp2 CONFLICT: '01 - Sz\xc3\xbclet\xc3\xa9s.mp3' was modified on both targets since last sync (n.a.). (No meta data available.) Local: 2018-06-16 08:42:11, 13,675,540 bytes Remote: 2018-06-16 12:34:32, 13,675,540 bytes (newer) Use Local, Skip, Binary compare, Help ? l COPY CONFLICT > 01 - Születés.mp3
DELETE MISSING >X 01 - Születés
.mp3
Wrote 1/1 files in 0 directories, skipped: 0.

3)

lev@yellow:~/tmp2$ pyftpsync upload --delete /home/lev/tmp2 ftp://android:xxxxx@192.168.43.230:2221:/tmp2

/usr/local/lib/python2.7/dist-packages/keyring/backends/Gnome.py:6: PyGIWarning: GnomeKeyring was imported without specifying a version first. Use gi.require_version('GnomeKeyring', '1.0') before import to ensure that the right version gets loaded. from gi.repository import GnomeKeyring Upload /home/lev/tmp2 to ftp://192.168.43.230/tmp2 CONFLICT: '01 - Sz\xc3\xbclet\xc3\xa9s.mp3' was modified on both targets since last sync (n.a.). (No meta data available.) Local: 2018-06-16 08:42:11, 13,675,540 bytes Remote: 2018-06-16 12:35:24, 13,675,540 bytes (newer) Use Local, Skip, Binary compare, Help ? l COPY CONFLICT > 01 - Születés.mp3
Traceback (most recent call last):es... File "/usr/local/bin/pyftpsync", line 11, in sys.exit(run()) File "/usr/local/lib/python2.7/dist-packages/ftpsync/pyftpsync.py", line 174, in run s.run() File "/usr/local/lib/python2.7/dist-packages/ftpsync/synchronizers.py", line 972, in run res = super(UploadSynchronizer, self).run() File "/usr/local/lib/python2.7/dist-packages/ftpsync/synchronizers.py", line 723, in run res = super(BiDirSynchronizer, self).run() File "/usr/local/lib/python2.7/dist-packages/ftpsync/synchronizers.py", line 189, in run res = self._sync_dir() File "/usr/local/lib/python2.7/dist-packages/ftpsync/synchronizers.py", line 519, in _sync_dir self.local.flush_meta() File "/usr/local/lib/python2.7/dist-packages/ftpsync/targets.py", line 321, in flush_meta self.cur_dir_meta.flush() File "/usr/local/lib/python2.7/dist-packages/ftpsync/metadata.py", line 180, in flush s = json.dumps(self.dir, sort_keys=True) File "/usr/lib/python2.7/json/init.py", line 251, in dumps sort_keys=sort_keys, **kw).encode(obj) File "/usr/lib/python2.7/json/encoder.py", line 209, in encode chunks = list(chunks) File "/usr/lib/python2.7/json/encoder.py", line 434, in _iterencode for chunk in _iterencode_dict(o, _current_indent_level): File "/usr/lib/python2.7/json/encoder.py", line 408, in _iterencode_dict for chunk in chunks: File "/usr/lib/python2.7/json/encoder.py", line 408, in _iterencode_dict for chunk in chunks: File "/usr/lib/python2.7/json/encoder.py", line 361, in _iterencode_dict items = sorted(dct.items(), key=lambda kv: kv[0]) UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 7: ordinal not in range(128)

Lev

mar10 commented 6 years ago

Thanks for reporting, I wasfinally able to reproduce this (or similar) here. I assume this is caused by the encoding used on the FTP server to store the file names. Which clients do you use to upload files to that server, only pyftpsync, or others as well (FileZilla, ...)?

TorokLev commented 6 years ago

Hi Mark, On another directory I simply copied files to the SD card directly. Then I ran pyftpsync to do sync on the same directory which became suspicious for me. So I made a test on a fresh directory, with a single file with the output that I presented to you. So I don't use FileZilla.

Is there anything that I can do to solve this? Or do you suggest any specific ftp server for android to use pyftpsync with instead?

Thanks a lot, Lev

On Sat, Jun 23, 2018 at 9:22 AM Martin Wendt notifications@github.com wrote:

Thanks for reporting, I wasfinally able to reproduce this (or similar) here. I assume this is caused by the encoding used on the FTP server to store the file names. Which clients do you use to upload files to that server, only pyftpsync, or others as well (FileZilla, ...)?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mar10/pyftpsync/issues/30#issuecomment-399645613, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUAQlmbHvaW2GjhyyGK3RGBbyx6xTF5ks5t_ezAgaJpZM4UqaDB .

mar10 commented 6 years ago

I think this is a general problem, that may be tricky to solve generally: How should pyftpsync handle the case that local and remote target use different encodings for file names? In order to support all use cases, I guess only UTF-8 would make sense, but re-encoding may lead to different file names on both sides.

I probably need to think about this a bit more, when I find time.

On another directory I simply copied files to the SD card directly.

TorokLev commented 6 years ago

On Sun, Jun 24, 2018 at 6:38 PM Martin Wendt notifications@github.com wrote:

I think this is a general problem, that may be tricky to solve generally: How should pyftpsync handle the case that local and remote target use different encodings for file names? In order to support all use cases, I guess only UTF-8 would make sense, but re-encoding may lead to different file names on both sides.

I probably need to think about this a bit more, when I find time.

On another directory I simply copied files to the SD card directly.

  • Did you use Windows File Explorer? Which filesystem was the SD card formatted with (FAT32, ...)?

It is fat32, created on linux I probably used midnight commander or Nautilus. Midnight commander was coughing on similar issues. I can retest it.

  • Would it work if the SD card is empty and you use pyftpsync upload to initialize the files, instead of copying? And then use pyftpsync sync after that?

I made pyftpsync to sync the file 3x in an empty directory on the SD card. The output was the one I sent you.

  • Does it make a difference if you run pyftpsync on Python 3?

Will test it.

Lev

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mar10/pyftpsync/issues/30#issuecomment-399769478, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUAQvNlSJuf3jqaydgXi9MtWB6M2YJCks5t_8CUgaJpZM4UqaDB .

mar10 commented 5 years ago

Was able to reproduce it with Python 2, but not with Python 3 so far.

TorokLev commented 5 years ago

Hi Mark, Maybe I used python 2. Cannot remember. Will test it again. But it points to the utf-8 problems you mentioned before. Thanks Lev

On Sat, 2 Feb 2019, 17:05 Martin Wendt <notifications@github.com wrote:

Was able to reproduce it with Python 2, but not with Python 3 so far.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/mar10/pyftpsync/issues/30#issuecomment-459976571, or mute the thread https://github.com/notifications/unsubscribe-auth/ABUAQiSLBo2LZ9xiR0tfq1TygfalLP6zks5vJbc6gaJpZM4UqaDB .

mar10 commented 5 years ago

I released v3.0, that should address the problem. Please open a new issue, if it still does not work for you.