Closed GoogleCodeExporter closed 9 years ago
By the way, copying the other way (from the OS X desktop to the sshfs-mounted
partition) works even for files/directories with accented characters.
Original comment by bogd...@gmail.com
on 2 Apr 2007 at 12:15
Unfortunately, this is a known problem that isn't fixed in MacFUSE.
First a bit on Unicode normalization: For some characters, there are multiple
ways to
represent them. For example, è can be represented in a precomposed manner
("\303\250", two bytes), or in a decomposed manner ("e\314\200", three bytes).
In
decomposed unicode, combining versions of the accents following the base-letter
are
preferred, whereas in precomposed unicode, the characters representing both the
base-letter and the accent are preferred. A "normalization" process on a unicode
sequence converts all characters in the sequence to the preferred normalization
(either pre- or decomposed unicode).
The problem is that HFS+ enforce filenames to be in decomposed unicode (more
exactly:
Unicode Normalization Form D), but other OSes (e.g. Windows) prefer (but do not
enforce) precomposed unicode (more exactly: Unicode Normalization Form C). The
filenames on your windows drive (e.g. "lumière") are precomposed. Linux passes
them
to sshfs, which passes them to MacFUSE, which passes them to Mac OS X all
precomposed. Mac OS X can display them, however you can't copy them to your HFS+
drive because HFS+ requires decomposed filenames.
The "obvious" solution (at first) seems that MacFUSE should decompose filenames
that
come in from fuse daemons (like sshfs). However this has some drawbacks. First,
the
fs daemon may contain a mixture of pre- and de-composed filenames, and we don't
want
to have to keep a list (in memory or elsewhere) of which ones we've normalized.
Second, we this wouldn't handle the pathological case of a directory containing
both
pre- and de-composed filenames (e.g., "lumie`re" and "lumière", where `
represents
the composing `). Yes, linux and windows allow that. Furthermore, decomposing
filenames causes them to take up more unicode characters, which may cause the
length
of a filename to go over the 255 character limit, which is hard-coded into the
Mac OS
X kernel.
As a result, we decided for now to leave it up to the fs daemon to properly
handle
unicode normalization. This means that problems like you face are known failure
cases.
We'd like to fix this in MacFUSE somehow, but it's unclear which is the best
way.
Original comment by andrewde...@gmail.com
on 2 Apr 2007 at 2:02
You're not going to be able to avoid keeping a list, so why not just do that?
You can
leave the multiple similar names case, which will probably never happen, as a
failure
case instead of failing for the large large number of non-English users who'd
get
screwed by the current situtation.
Original comment by paracel...@gmail.com
on 9 Jun 2007 at 3:47
Actually, I think we can avoid keeping a list, tho it has drawbacks. See:
http://code.google.com/p/macfuse/wiki/DesignDocFilenameEncodingSupportForMacFUSE
(I'd welcome your input)
Original comment by andrewde...@gmail.com
on 9 Jun 2007 at 5:47
I guess that would work, but it also fails instead of doing the right thing
when it
encounters problems.
To really do the right thing, you should consider taking Mozilla's
UniversalDetector,
which can auto-detect a large number of encodings, and using it to
automatically find
which encoding is used. I've used this for similar purposes in my program The
Unarchiver, which (obviously) unarchives files from a large variety of operating
system locales.
Original comment by paracel...@gmail.com
on 9 Jun 2007 at 6:05
I agree with paracelsus first comment. I'm having trouble with ntfs-3g in Mac
because of this. Windows and
Linux both create composed characters and actually accept decomposed
characters, but Mac creates
decomposed characters and cannot really open filenames with composed characters
because it translates
them somewhere along the way to decomposed.
So, since Windows, the native host OS for NTFS, uses composed characters, you
could just "translate" in a way
that Mac sees decomposed but the real filesystem uses composed. I don't know
about other filesystems, but
this seems very straight away. In fact, the built-in read-only Mac ntfs does
the translation.
The problem is how to handle real decomposed filenames, specially if an
equivalent one exists with composed
characters, but that should be a problem only to previous MacFuse users, not to
newcomers. You could handle
it the same way ntfs-3g handled unknown encoding names, i.e. log an error and
ignore it. And/or make a tool
that renames decomposed to composed, perhaps trailling with underscore if an
equivalent composed filename
already exists.
The built-in Mac ntfs doesn't handle decomposed and equivalent filenames much
gracefully. The decomposed
form filenames get listed with a normal "ls" or in Finder (icon or column
views), but not if you "ls -l", "cat" or
click on the file in Finder (it "disappears") or list it in list view. If an
equivalent filename exists, the attributes
and contents ("ls -l", "cat", Finder's details and preview) are those of the
file with the equivalent composed
name.
Anyway, it seems MacFuse should have made this translation from the start to
avoid this type of confusion.
The problem would still be there, but it would be mitigated and would have to
be exploited in some other
much more complicated way. I guess this holds true for sshfs and others.
So I vote for always translating to composed form for actual filesystem
operations, or if you see it's more fit,
always translating to decomposed form for the Mac (or both, I don't really know
how much implications there
are).
Original comment by asstolav...@hotmail.com
on 18 Jul 2007 at 4:25
I wonder: could this be related to an inability to use sshfs to access a Windows
machine running OpenSSH when the username contains a space?
Original comment by matta...@gmail.com
on 25 Aug 2007 at 6:43
Where may I find a "tool that renames decomposed to composed characters" ?
Please help me, I have hundreds of files that I'd like to rename with composed
characters.
Original comment by pierre.g...@gmail.com
on 16 Nov 2007 at 11:24
If you are looking for a solution that runs on Linux, I think you want convmv.
http://www.j3e.de/linux/convmv/
I don't know of a solution that runs on Windows. There is no solution for Mac
because all filenames are stored as
decomposed characters.
Original comment by a...@gmail.com
on 16 Nov 2007 at 5:43
Thanks adlr !
I downloaded convmv and ran it on Ubuntu :
./convmv -r -f utf-8 -t utf-8 --nfc --notest /media/LaCie/
Great!
Original comment by pierre.g...@gmail.com
on 24 Nov 2007 at 1:33
I have the proposal to solve this problem.
If we use sshfs with "modules option", then we may solve this problem.
command example:
$ sshfs user@sftpserver:/dir /mount/point/ -o
modules=iconv,from_code=UTF-8,to_code=UTF-8-MAC
(from_code is sftpserver side charset, to_code is sshfs side charset)
I checked that this function was effective, on Linux Box.
(CentOS4.2 kernel 2.6.9-55.0.12.EL/FUSE 2.7.1/SSHFS 1.8)
But now, I know following conditions:
1. GNU libiconv, customized by Apple, have "decomposed unicode
encoding(UTF-8-MAC)".
2. "FUSE 2.7.x" added new feature "Add filename charset conversion module".
3. I seemed that "MacFuseCore 1.1.0" or "MacFusion 1.2 Beta 3" is not linked
"libiconv"(on Mac OS X 10.4.x).
I hope to solve this problem.
Thank you.
Original comment by Zeta...@gmail.com
on 19 Dec 2007 at 11:00
A bit more weirdness:
I have noticed that it is possible to copy the files using the shell. I can
either
SSH into the Mac or open a Terminal, then just do a simple "cp" from the
sshfs-mounted directory to the local disc (eg, the Desktop), and it works with
no
comments. The names are decomposed on the fly.
I suppose it is "cp" who does it. What I don't understand is why the same thing
doesn't happen with the graphic shell. Could one of the devs try to trace the
two
cases and see if sshfs is called differently?
Original comment by bogd...@gmail.com
on 8 Jan 2008 at 12:04
I've tried;
ntfs-3g /dev/disk1s1 x -omodules=iconv,from_code=UTF-8,to_code=UTF-8-MAC
But got nothing significant.
NFCed character still appears in filename so to deny Finder copy them.
MacFUSE 1.3.1 + ntfs-3g 1.1120, Leopard.
What did I wrong?
Original comment by kei...@gmail.com
on 9 Jan 2008 at 8:26
Character encodings should almost certainly be handled by the filesystem
itself, not
generally by MacFUSE. No matter how good the heuristic is, guessing character
encodings is almost always a bad idea.
SSHFS has access to the local and remote environment, and should be able to
determine
the character encodings to translate between. I do hope this issue is regarded
as a
bug in SSHFS, not MacFUSE (unless I'm missing something about MacFUSE that
makes it
also suspect).
Original comment by forest...@gmail.com
on 30 Apr 2008 at 1:18
I should mention that it is also reasonable for end users to enable the iconv
module
if the filesystem doesn't handle character encodings at all. I just really
want to
oppose the idea of MacFUSE automatically guessing character encodings by
default.
Original comment by forest...@gmail.com
on 30 Apr 2008 at 1:19
When will someone come with a real solution? this thing it's kind of annoying.
I know
I can just delete special characters from my files and make it easier, or make
a new
fat32 partition for perhaps my documents but It takes some time...
I hope the solution comes quickly... I would really love to help... but I don't
know
much about all this programing languages used... XP
Original comment by esteban...@gmail.com
on 17 May 2008 at 4:12
The 'coder way' solution :
1/ Assuming MacOS X local char encoding for filename is UTF8-MAC (known as
UTF8-Decomposed form)
2/ Assuming Fuse embbeded File System char encoding for filename is
UTF8-Composed
Form (other char encoding will works too)
3/ let see iconv() man with 'man 3 iconv' (libiconv is located under /usr/lib
under
macosx)
4 /in your custom FileSystem source code, for every incomming fuse callback that
provide a full path to seek (this is the case for all callback but readdir()),
concider using :
transcoder = iconv_open("UTF-8","UTF-8-MAC");
followed by a
iconv(transcoder, &path, &srcBytesCount, &outpath, &dstBytesCount)
then work with reencoded 'outpath' string to seek informations ...
5/ in the readdir() callback
concider using :
transcoder = iconv_open("UTF-8-MAC","UTF-8");
followed by a
iconv(transcoder, &UTF8_filename, &srcBytesCount, &UTF8_MAC_filename,
&dstBytesCount)
before calling
filler() function with your freshly reencoded 'UTF8_MAC_filename' string.
enjoy !
Original comment by franck.b...@gmail.com
on 19 Jun 2008 at 1:14
> The 'coder way' solution :
> ...
The coder would be well advised to look at the source code of the open source
MacFUSE Core. Since MacFUSE 1.0
(released October 2007), the user-space library supports stacking of file
system modules. One of the built-in
modules is "iconv". See lib/modules/iconv.c in the user-space library source.
The module takes two arguments:
a "from" encoding name and a "to" encoding name. Then, for each incoming
operation, the library automatically
does what you are suggesting.
Original comment by si...@gmail.com
on 19 Jun 2008 at 4:32
Unfortunatly since '-omodules=iconv,from_code=UTF-8,to_code=UTF-8-MAC' is not
understood by macfuse (v1.5.1) fuse_main() function, the 'coder way' seems to
be the
only solution, for now.
Original comment by franck.b...@gmail.com
on 20 Jun 2008 at 8:13
Can you clarify what exactly doesn't work?
I tried the following in the 1.5.1 tree and the arguments are received by the
iconv module as expected:
$ ./hello /tmp/hello -omodules=iconv,from_code=UTF-8,to_code=UTF-8-MAC
Besides, the iconv module has defaults for both from_code and to_code
arguments. If you don't specify
from_code, it should use UTF-8. If you don't specify to_code, it should use the
value of $LC_CTYPE. Does that
not work?
Original comment by si...@gmail.com
on 20 Jun 2008 at 3:52
Performin more test, I found strange behaviour...
I usually call fuse_main using following argv:
main /Volumes/Point_A -f -onoappledouble -ovolname=Point
-ovolicon=/Users/bonin134/Documents/Projets/Point/Build/MacOs/XCode/Drive/Debug/
Point.app/Contents/MacOS/../Resources/Point.icns
To use iconv module, I tried this argv without success:
main /Volumes/Point_A -f -omodules=iconv,from_code=UTF-8,to_code=UTF-8-MAC
-onoappledouble -ovolname=Point
-ovolicon=/Users/bonin134/Documents/Projets/Point/Build/MacOs/XCode/Drive/Debug/
Point.app/Contents/MacOS/../Resources/Point.icns
output console error is 'fuse: unknown option `from_code=UTF-8''
!!!! BUT !!!, following command line seems to work :
main /Volumes/Point_A -f -omodules=iconv,from_code=UTF-8,to_code=UTF-8-MAC
-onoappledouble -ovolname=Point
don't see why, but there is a problem between -ovolicon and -omodules...
Original comment by franck.b...@gmail.com
on 23 Jun 2008 at 1:34
> !!!! BUT !!!, following command line seems to work :
> don't see why, but there is a problem between -ovolicon and -omodules...
"-ovolicon=/path/to/icon" is a special option: it's a convenience shorthand for
"-
omodules=volicon,iconpath=/path/to/icon". This works fine if you are using no
other modules, but if you are, this
wouldn't work because the library wants modules specified as
"-omodules=M1:M2:...:Mn". So, in that case, you will
have to use the longhand form. For example:
"-omodules=iconv:volicon,iconpath=/path/to/icon,from_code=UTF-8,to_code=UTF-8-MA
C"
The order of arguments doesn't matter.
I acknowledge that this should at least be documented. I don't expect end users
to figure this out. *But*, you sound
like a developer, so why do black-box debugging *and* reinvent/reimplement the
functionality of the iconv module
within your file system? It's easy enough to look at the MacFUSE source.
Original comment by si...@gmail.com
on 24 Jun 2008 at 12:51
thanks, now it works perfectly.
>so why do black-box debugging *and* reinvent/reimplement
because when I found that my UTF-8-D char problem might be solved by
'omodules=iconv...'option, I couldn't imagine it could interfer with -ovolicon
option
I thought of a syntax problem from myself or from google help I found.
then I see some people having the same problem, so I decided to use libiconv by
myself.
Any way, where should I post 'developper side' questions about libfuse usage
that are
not issues about libfuse ?
Original comment by franck.b...@gmail.com
on 24 Jun 2008 at 1:08
> Any way, where should I post 'developper side' questions about libfuse usage
that are
not issues about libfuse ?
There's official macfuse forum is for both users and developers.
http://groups.google.com/group/macfuse-devel
Original comment by si...@gmail.com
on 24 Jun 2008 at 9:37
This has been open forever so I'm finally marking it as "WontFix".
Either use the iconv module that's built into the user-space library, or handle
it within the user-space file
system.
Original comment by si...@gmail.com
on 12 Nov 2008 at 5:02
[deleted comment]
This should be threated as Open, as there are viable solutions. On main wiki,
there is a proposal that is related to
this bug report:
http://code.google.com/p/macfuse/wiki/FILENAME_ENCODING_PROPOSAL
Original comment by brod...@gmail.com
on 3 May 2009 at 8:48
Thanks. I had some songs with è in the title that Mac OSX would not copy from
a Linux drive mounted over sshfs. Running this on Linux fixed the problem:
sudo apt-get install convmv
convmv -r -f utf-8 -t utf-8 --nfd --notest /path/to/music
Original comment by mat.sola...@gmail.com
on 29 Feb 2012 at 9:11
Hi, just for the sake of discussion, I was able to solve this issue based on
the previous comments. My use case is simple, after 13+ years of using Linux
I'm moving to OSX and needed to replicate my previos config with Ubuntu. I
access using sshfs to the Company's server and I was having the character
issue, OSX told me that it was unable to find the application whenever I tried
to open a file with a strange character.
I was able to properly mount the sshfs resource with this line
sshfs -p XXX myuser@myserver:/share /Volumes/share
-orw,nodev,allow_other,reconnect,uid=XXXXX,gid=XXX,max_read=65536,compression=ye
s,auto_cache,no_check_root,kernel_cache,umask=0002,workaround=rename,auto_cache,
reconnect,defer_permissions,noappledouble,negative_vncache,intr,modules=iconv,fr
om_code=UTF-8,to_code=UTF-8-MAC,volname=share
The modules=iconv,from_code=UTF-8,to_code=UTF-8-MAC part was the one who did
the trick.
best regards
Original comment by amuji...@gmail.com
on 4 Aug 2014 at 12:59
Original issue reported on code.google.com by
bogd...@gmail.com
on 2 Apr 2007 at 12:13