macos-fuse-t / fuse-t

Other
808 stars 3 forks source link

Cannot handle certain normalized forms of unicode. Example: é (e + U+0301) vs é (U+00E9) #16

Open Jwink3101 opened 1 year ago

Jwink3101 commented 1 year ago

I am using FUSE-T with rclone. I am, however, 99% sure this is a FUSE-T issue and not rclone.

I have a file with the following character: . That is e + U+0301 or UTF8 encoded: e\xcc\x81. This consistently breaks listing the directory. If I change it to the normalized form é which is (U+00E9) or \xc3\xa9 in UTF8, it works fine.

Rclone handles the name just fine in listing and I even see it in the rclone logs. I have the following files to demonstrate (I added around it to see if anything listed first):

Source Dir:

$ ls -1

before
test1 (e + U+0301) é.txt
test2 (U+00E9) é.txt
z-after

Mount Dir:

$ ls
ls: fts_read: Permission denied

The rclone -vv log is: rclog.log. This seems to indicate rclone isn't having the issue but rather FUSE-T.

Are there other tests I can do to assists? Additional logging? It is 100% reproducible on my machine so just let me know.

macos-fuse-t commented 1 year ago

I cannot reproduce your issue. This is what I did: Created a file with \xcc\x81 name encoding (é) on Linux and Mac machines Mounted a Linux with sshfs and listed the folder. It worked as expected. Mounted a Mac with loop fs (fusexmp_fh in libfuse examples). ls worked as expected. I wonder which target platform are you using with rclone? I assume that fuse-t is the latest version.

Jwink3101 commented 1 year ago

I assume that fuse-t is the latest version.

According to brew I am when I run brew upgrade fuse-t

I wonder which target platform are you using with rclone?

Local to test. crypted-wrapped local is where I discovered the issue.


I do not know how to do the loopback fs example but am willing to try if you show me. I did install sshfs from brew install fuse-t-sshfs and didn't have the issue.

Maybe my assertion that this is FUSE-T and not rclone is incorrect.

I created a forum post with some additional detail. Included in that is the full log where I noticed the lines:

2023/02/27 07:13:23 DEBUG : Adding "-o modules=iconv,from_code=UTF-8,to_code=UTF-8-MAC" for macOS
2023/02/27 07:13:23 DEBUG : Local file system at /Users/jwinokur/Desktop/mount_test/source: Mounting with options: ["-o" "attr_timeout=1" "-o" "fsname=/Users/jwinokur/Desktop/mount_test/source" "-o" "subtype=rclone" "-o" "max_readahead=131072" "-o" "atomic_o_trunc" "-o" "daemon_timeout=600" "-o" "volname=Users jwinokur Desktop mount_test source" "-o" "noappledouble" "-o" "modules=iconv,from_code=UTF-8,to_code=UTF-8-MAC"]

I wonder if those are causing issues. I will wait for the rclone developers to confirm if/how to disable that config but can you also test to see if enabling them causes the problem? And help deduce if it is (a) the appropriate flag and (b) the appropriate response?

ncw commented 1 year ago

Does FUSE-T use macOS UTF-8 NFD form for UTF-8 internally like OSXFUSE does? If not then the "-o modules=iconv,from_code=UTF-8,to_code=UTF-8-MAC" will be doing the wrong thing.

If not then this makes it subtly incompatible with OSXFUSE.

This issue shows the NFD problem https://github.com/osxfuse/osxfuse/issues/585 and the reason why rclone applies that iconv rule.

If FUSE-T only shows UTF-8 NFC in its external interfaces then that is probably a good design decision, but it is different to what OSXFUSE does.

macos-fuse-t commented 1 year ago

It looks like an encoding issue, although I think FUSE-T handles it correctly as tested with sshfs and loopback fs. I will look into the issue though. I didn't know there are different UTF-8 encodings.

macos-fuse-t commented 1 year ago

Unicode conversion is handled by this go module: https://pkg.go.dev/unicode/utf8

Jwink3101 commented 1 year ago

For what it's worth, adding the -o modules=iconv,from_code=UTF-8,to_code=UTF-8 rclone flags to disable the conversion, as suggested by @ncw on the forum post fixes it.

I am not sure where this bug belongs at this point. Does FUSE-T treat incompatibility with OSXFUSE as a bug? Should a more modern (and awesome by the way!!!) project inherit the technical debt of its predecessor?

macos-fuse-t commented 1 year ago

Just checked and It's possible to normalize strings as UTF-8 NFD as opposed to NFC, but I'm not sure it should be a default option. macOS Finder is happy with whatever encoding I throw at it. I can add a mount flag "-o utf8-nfd", would that be helpful?

ncw commented 1 year ago

Just checked and It's possible to normalize strings as UTF-8 NFD as opposed to NFC, but I'm not sure it should be a default option. macOS Finder is happy with whatever encoding I throw at it.

I think OSXFUSE needs to have the NFD form as that is what the kernel interfaces of macOS are expecting.

I'm guessing since FUSE-T plugs into the NFS layer that the NFS layer is dealing with the NFC to NFD translation for you?

I can add a mount flag "-o utf8-nfd", would that be helpful?

This should be the default if you want 100% compatibility with OSXFUSE I think.

I'm not sure that is a good idea though as NFD encoding is a pain to deal with.

We can write that you'll need -o modules=iconv,from_code=UTF-8,to_code=UTF-8 in the docs for rclone and FUSE-T or perhaps get rclone to auto detect FUSE-T somehow (any ideas?).

macos-fuse-t commented 1 year ago

I don't think NFD is needed for macOS anymore, perhaps that was true for older macOS but now you can create a file containing whatever encoding and it would be shown fine.

ncw commented 1 year ago

I don't know a great deal about this, but I found a nice explainer here: https://eclecticlight.co/2021/05/08/explainer-unicode-normalization-and-apfs/

So I think you are right for APFS but HFS+ volumes will still require NFD. I've no idea on the relative popularity of these things (not a mac user) so maybe it is irrelevant now.

kapitainsky commented 1 year ago

This is still an issue on macOS.

-o modules=iconv,from_code=UTF-8,to_code=UTF-8 option only solves problem that NFC encoded folders and file names do not disappear but they are not accessible by Finder.

For some reason iconv and FUSE-T do not work as expected

Here my test served by rclone via mount (FUSE-T)

Original data NFC and NFD encoded folder and file:

drwxr-xr-x  1 kptsky  staff  0 Jun 19 12:07 NFCééééDIR
drwxr-xr-x  1 kptsky  staff  0 Jun 19 12:08 NFDééééDIR
-rw-r--r--  1 kptsky  staff  6 Jun 15 19:20 NFCéééFILE.txt
-rw-r--r--  1 kptsky  staff  4 Jun 15 07:10 NFDéééFILE.txt

This is also what I see in mount with -o modules=iconv,from_code=UTF-8,to_code=UTF-8 but:

image

NFC file can not be open

NFD one works.

You can see also that NFC and NFD have different icons - NFC one is generic for txt as preview can not be generated.

When I try -o modules=iconv,from_code=UTF-8,to_code=UTF-8-MAC - only NFC encoded things are visible and are accessible from Finder

image

This is a bit of surprise as I would expect that this option should produce consistent NFD as in the example below:

$ echo -e "éé\c" | hexdump -C
00000000  c3 a9 65 cc 81                                    |..e..|

$ echo -e "éé\c" | iconv -f UTF-8 -t UTF-8 | hexdump -C
00000000  c3 a9 65 cc 81                                    |..e..|

$ echo -e "éé\c" | iconv -f UTF-8 -t UTF-8-MAC | hexdump -C
00000000  65 cc 81 65 cc 81                                 |e..e..|

Now looking for possible solution I was thinking that:

  1. as @macos-fuse-t mentioned mount flag "-o utf8-nfd" could be added - I think it would be extremely useful - maybe also "-o utf8-nfc" ? It could potentially allow to develop fully working solution.

  2. As described here Apple is aware of NFC/NFD problems with NFS and suggests:

This a known NFS issue with precomposed and decomposed. As mentioned, Linux systems preform precomposed file names (NFC), while macOS/iOS userspace frameworks all default decomposed (NFD). So no matter what is provided to them (NFC or NFD) any pathname that comes in from an Apple framework will always be in NFD and the FS has to deal with. You should mount your NFS share using “nfc” parameter to instruct the client to use precomposed instead of the default decomposed. We were able to open both precomposed/decomposed files and folders while mounting with “nfc” enabled. Please let us know if it helps to resolve the issue.

mount_nfs manual page :
nfc Convert name strings to Unicode Normalization Form C (NFC) when sending them to the NFS server. This option may be used to improve interoperability with NFS clients and servers that typically use names in the NFC form.

FUSE-T uses NFS so maybe issues comes from this problem.

Would it be possible to add this as optional mount flag "-o nfc"?

With these new flags I could try to find working rclone/fuse-t solution.

alexfs commented 1 year ago

I can definitely add "-o nfc" mount flag. Would that be enough to make rclone working or nfc-nfd/nfd-nfc conversion also needed?

kapitainsky commented 1 year ago

I can definitely add "-o nfc" mount flag. Would that be enough to make rclone working or nfc-nfd/nfd-nfc conversion also needed?

It would be great if possible - I think that really needed are:

"-o nfc" "-o utf8-nfd"

And this would be nice to have:

"-o utf8-nfc"

alexfs commented 1 year ago

I'm a bit confused. There are two peers: macos and user on macos side "-o nfc" flag instructs whether nfc or nfd encoding is used. If not given nfd is the default. I think what you want is a user peer flag encoding "user-utf8-nfc". If not given nfd is the default.

So there would be four cases:

  1. No flags meaning no conversion
  2. "-o nfc" and "user-utf8-nfc". mount executed with nfc. No conversion performed by fuse-t
  3. "-o nfc". mount executed with nfc and nfc-nfd conversion performed between macos/user
  4. "user-utf8-nfc". nfd-nfc conversion performed between macos/user

Is this correct analysis?

kapitainsky commented 1 year ago

Yes your logic is correct and I wish it works like that - however reality seems to be different and I do not have all answers yet.

In theory -o modules=iconv,from_code=UTF-8,to_code=UTF-8-MAC should do the trick and send to rclone everything in NFD - but it does not. As in my example above something strange happens and NFC names are converted to NFD (and work on macOS) but any NFD ones are just gone....

The new flags would be different approach to the old problem - which I hope we can at last tackle completely.

We might have to make some changes on rclone side as well - I am not sure about it yet.

kapitainsky commented 1 year ago

With no conversion all NFD files work but only when full path is NFD as well. So if I put NFD file into NFC folder it can't be opened in Finder:

image

(white icons are files you can open)

kapitainsky commented 1 year ago

And where it gets really confusing is that with no conversion all files are accessible in bash/zsh - it is similar to the issue with NFS I related earlier to.

https://openradar.appspot.com/FB8957502

alexfs commented 1 year ago

Ok, let's start with "-o nfc" option for now. Later I will add more options if needed. "-o nfs" will just passthrough characters between both ends.

kapitainsky commented 1 year ago

Other question is how you can programmatically distinguish between OSXFUSE and FUSE-T? OSXFUSE will slowly become history but for now still a lot of people use it - ideally rclone could detect what is used and apply correct options.

kapitainsky commented 1 year ago

Thank you for adding -o nfc. Preliminary testing shows that now I can safely save NFD encoded files from macOS to mount and they end up in filesystem as NFC.

Now only problem is with NFD files already in the filesystem. I can't assume that they will be always NFC - as users can upload them using other means than mount. For this I hope -o utf8-nfd will do the trick.

alexfs commented 1 year ago

What would -o utf8-nfd do?

kapitainsky commented 1 year ago

cloud ---(NFC,NFD )---> rclone ---(NFC,NFD )---> fuse-t ---(NFD) ---> macOS

alexfs commented 1 year ago

Afraid it won't work. You can't mix both NFC and NFD. With -o nfc flag macos converts everything to NFC meaning fuse-t won't know if the original file is NFC or NFD. What can be done in case -o nfc is not specified is to convert the rclone side to NFD.

kapitainsky commented 1 year ago

Thanks for your help. Indeed now there are two options - with -o nfc rclone side should normalize all to NFC or without extra options to NFD.

But still if possible and easy to add I think -o utf8-nfd would give us more options:)