Open GoogleCodeExporter opened 9 years ago
I can create paths with unicode names without problems with fuse.py. Do you
have any specific cases were it fails?
Original comment by verigak
on 6 Mar 2012 at 3:44
I think what anacrolix meant is that e.g. Operations.getattr gets a bytes
instance rather than a str instance for a path.
Original comment by antic...@gmail.com
on 6 Mar 2012 at 10:14
That's right.
Original comment by anacrolix@gmail.com
on 7 Mar 2012 at 7:02
My patches on github.com: https://github.com/terencehonles/fusepy probably fix
what you need (Ideally my patches will be pushed upstream)
Original comment by Terence....@gmail.com
on 24 Apr 2012 at 8:12
This is wrong! User code should be written to deal with bytes, not the other
way around. On POSIX operating systems, file paths are NOT specified as being
UTF-8 or any specific Unicode encoding. The only correct way to deal with
filenames on Unix is to treat them as byte strings. Most software I've seen
treats them as UTF-8, but at the file system level, they are binary strings and
any FUSE implementation would be broken if it didn't support non-UTF-8
filenames.
In other words, I need to be able to cd to a FUSE-mounted file system, open
Python 3, and type this:
>>> import os
>>> open(b'd\xe9j\xe0_vu.txt', 'w').close()
>>> os.listdir(b'.')
[b'd\xe9j\xe0_vu.txt']
In most shells, if you ls this file, it will display as d?j?_vu.txt. But it is
a perfectly valid Latin-1-encoded filename. If fusepy encoded the filename as a
Unicode string before sending it to the user code, it would either throw an
exception in this case, or corrupt the filename.
I have tested Terence's fork of fusepy and it breaks this assumption. He added
an 'encoding' argument to the FUSE constructor, and then decodes all the bytes
values to strs with this encoding before giving them to the user-supplied
operations, and encodes all strs supplied by user code before giving them back
to the operating system. Unfortunately, it isn't a correct solution to simply
say "pick an encoding before you start". File systems must be able to support
different files with different encodings on their names.
If you run Terence's version of memory.py and then perform my above example,
you get this:
Traceback (most recent call last):
File "fuse.py", line 402, in _wrapper
return func(*args, **kwargs) or 0
File "fuse.py", line 410, in getattr
return self.fgetattr(path, buf, None)
File "fuse.py", line 640, in fgetattr
attrs = self.operations('getattr', path.decode(self.encoding), fh)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2: invalid
continuation byte
Alternatively, create the b'd\xe9j\xe0_vu.txt' file somewhere on a normal
drive, and then run Terence's version of the loopback.py example on the
directory containing that file. Attempting to 'ls' the directory results in
this exception:
Traceback (most recent call last):
File "fuse.py", line 402, in _wrapper
return func(*args, **kwargs) or 0
File "fuse.py", line 586, in readdir
if filler(buf, name.encode(self.encoding), st, offset) != 0:
UnicodeEncodeError: 'utf-8' codec can't encode character '\udce9' in position
1: surrogates not allowed
The only solution is to make fuse.py deal with bytes throughout, then change
all of the examples to also deal with bytes. I will upload my patch for this on
the other bug (Issue 36).
Original comment by matt.gi...@gmail.com
on 14 Jul 2012 at 1:03
Original issue reported on code.google.com by
anacrolix@gmail.com
on 20 Feb 2011 at 3:14