spotify / snakebite

A pure python HDFS client
Apache License 2.0
856 stars 216 forks source link

Can't ls() a directory with a Unicode name containing files with Unicode filenames #108

Open wsong opened 9 years ago

wsong commented 9 years ago

If you have a directory called "/testdir", and it contains a file called "testfile", then this:

list(client.ls([u'/testdir']))

Gives this stacktrace:

File "<stdin>", line 1, in <module>
  File "/home/wsong/memsql-loader/venv/local/lib/python2.7/site-packages/snakebite/client.py", line 139, in ls
    recurse=recurse):
  File "/home/wsong/memsql-loader/venv/local/lib/python2.7/site-packages/snakebite/client.py", line 1094, in _find_items
    full_path = self._get_full_path(path, node)
  File "/home/wsong/memsql-loader/venv/local/lib/python2.7/site-packages/snakebite/client.py", line 941, in _get_full_path
    return os.path.join(path, node.path)
  File "/home/wsong/memsql-loader/venv/lib/python2.7/posixpath.py", line 80, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 1: ordinal not in range(128)

If you instead pass in a Python string to ls(), you get this:

list(client.ls(['/testdir']))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/wsong/memsql-loader/venv/local/lib/python2.7/site-packages/snakebite/client.py", line 139, in ls
    recurse=recurse):
  File "/home/wsong/memsql-loader/venv/local/lib/python2.7/site-packages/snakebite/client.py", line 1078, in _find_items
    fileinfo = self._get_file_info(path)
  File "/home/wsong/memsql-loader/venv/local/lib/python2.7/site-packages/snakebite/client.py", line 1206, in _get_file_info
    request.src = path
  File "/home/wsong/memsql-loader/venv/local/lib/python2.7/site-packages/google/protobuf/internal/python_message.py", line 471, in field_setter
    self._fields[field] = type_checker.CheckValue(new_value)
  File "/home/wsong/memsql-loader/venv/local/lib/python2.7/site-packages/google/protobuf/internal/type_checkers.py", line 166, in CheckValue
    (proposed_value))
ValueError: '/\xef\xbd\x94\xef\xbd\x85\xef\xbd\x93\xef\xbd\x94\xef\xbd\x84\xef\xbd\x89\xef\xbd\x92' has type bytes, but isn't in 7-bit ASCII encoding. Non-ASCII strings must be converted to unicode objects before being added.

The right fix is probably to do some type checking before you pass things into os.path.join().

kyamagu commented 6 years ago

Relevant: https://github.com/fabric/fabric/issues/1292#issuecomment-82621653