rclone / rclone

"rsync for cloud storage" - Google Drive, S3, Dropbox, Backblaze B2, One Drive, Swift, Hubic, Wasabi, Google Cloud Storage, Azure Blob, Azure Files, Yandex Files
https://rclone.org
MIT License
46.11k stars 4.13k forks source link

mailru backend "invalid characters in object name" for paths containing U+0439 (cyrylic й) on macos #8042

Open doubleaxe opened 1 week ago

doubleaxe commented 1 week ago

What is the problem you are having with rclone?

As title states I get error "invalid characters in object name" for paths, which contain unicode U+0439 (cyrylic й letter). I believe that this issue is MacOS specific, but I cannot test it on Windows yet. I came across this post, which describes something similar: https://www.alfredforum.com/topic/2015-encoding-issue/. Probably this issue also affects other cloud providers.

I am js developer, not golang developer, but in my opinion this is weird MacOS UTF-8 normalization issue. For example encodeURIComponent('й') will produce %D0%B9, but encodeURIComponent('ый') will produce %D1%8B%D0%B8%CC%86 - the same character is encoded differently as %D0%B8%CC%86.

What is your rclone version (output from rclone version)

rclone v1.67.0
- os/version: darwin 13.6.7 (64 bit)
- os/kernel: 22.6.0 (x86_64)
- os/type: darwin
- os/arch: amd64
- go/version: go1.22.4
- go/linking: dynamic
- go/tags: cmount

Which OS you are using and how many bits (e.g. Windows 7, 64 bit)

MacOS Ventura 13.6.7 Intel 64 bit

Which cloud storage system are you using? (e.g. Google Drive)

Cloud mail.ru

The command you were trying to run (e.g. rclone copy /tmp remote:tmp)

rclone sync "/opt/data/media/Photo/Camera_2000-2012/2006/2006_10_06-Первый снег" "mailru:/backups/media/Photo/Camera_2000-2012/2006/2006_10_06-Первый снег" --progress --combined $dir/combined-output.log --error $dir/error-output.log --inplace --modify-window 2s --fix-case --track-renames --bwlimit 5M --log-file $dir/debug.log --log-level DEBUG --dump bodies

A log from the command with the -vv flag (e.g. output from rclone -vv copy /tmp remote:tmp)

2024/09/02 12:54:36 INFO  : Starting bandwidth limiter at 5Mi Byte/s
2024/09/02 12:54:36 DEBUG : rclone: Version "v1.67.0" starting with parameters ["rclone" "sync" "/opt/data/media/Photo/Camera_2000-2012/2006/2006_10_06-Первый снег" "mailru:/backups/media/Photo/Camera_2000-2012/2006/2006_10_06-Первый снег" "--progress" "--combined" "/opt/work/ssh/local/combined-output.log" "--error" "/opt/work/ssh/local/error-output.log" "--inplace" "--modify-window" "2s" "--fix-case" "--track-renames" "--bwlimit" "5M" "--log-file" "/opt/work/ssh/local/debug.log" "--log-level" "DEBUG" "--dump" "bodies"]
2024/09/02 12:54:36 DEBUG : Creating backend with remote "/opt/data/media/Photo/Camera_2000-2012/2006/2006_10_06-Первый снег"
2024/09/02 12:54:36 DEBUG : Using config file from "/Users/aousov/.config/rclone/rclone.conf"
2024/09/02 12:54:36 DEBUG : Creating backend with remote "mailru:/backups/media/Photo/Camera_2000-2012/2006/2006_10_06-Первый снег"
2024/09/02 12:54:36 DEBUG : You have specified to dump information. Please be noted that the Accept-Encoding as shown may not be correct in the request and the response may not show Content-Encoding if the go standard libraries auto gzip encoding was in effect. In this case the body of the request will be gunzipped before showing it.
2024/09/02 12:54:36 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/09/02 12:54:36 DEBUG : HTTP REQUEST (req 0xc0004b58c0)
2024/09/02 12:54:36 DEBUG : GET /api/m1/file?access_token=xxx&home=backups%2Fmedia%2FPhoto%2FCamera_2000-2012%2F2006%2F2006_10_06-%D0%9F%D0%B5%D1%80%D0%B2%D1%8B%D0%B8%CC%86+%D1%81%D0%BD%D0%B5%D0%B3&limit=2147483647&offset=0 HTTP/1.1
Host: cloud.mail.ru
User-Agent: rclone/v1.67.0
Accept: */*
Accept-Encoding: gzip

2024/09/02 12:54:36 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/09/02 12:54:36 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2024/09/02 12:54:36 DEBUG : HTTP RESPONSE (req 0xc0004b58c0)
2024/09/02 12:54:36 DEBUG : HTTP/2.0 404 Not Found
Content-Length: 185
Cache-Control: no-store, no-cache, must-revalidate
Content-Type: application/json; charset=utf-8
Date: Mon, 02 Sep 2024 05:54:36 GMT
Pragma: no-cache
Server: nginx
Strict-Transport-Security: max-age=15768000; includeSubDomains; preload
X-Content-Type-Options: nosniff
X-Email: xxx@mail.ru
X-Frame-Options: SAMEORIGIN
X-From: .lightning_k8s
X-Host: cld-front-ext3.q
X-Page-Id: 
X-Req-Id: xxx
X-Server: lightning
X-Timestamp: 1725256476
X-Timing: 0.0797359943389893
X-Ua-Compatible: IE=Edge
X-Upstream-Time: -

{"email":"xxx@mail.ru","body":{"home":{"value":"/backups/media/Photo/Camera_2000-2012/2006/2006_10_06-Первый снег","error":"not_exists"}},"time":1725256476486,"status":404}
2024/09/02 12:54:36 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2024/09/02 12:54:36 DEBUG : fs cache: renaming cache item "mailru:/backups/media/Photo/Camera_2000-2012/2006/2006_10_06-Первый снег" to be canonical "mailru:backups/media/Photo/Camera_2000-2012/2006/2006_10_06-Первый снег"
2024/09/02 12:54:36 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/09/02 12:54:36 DEBUG : HTTP REQUEST (req 0xc00083ea20)
2024/09/02 12:54:36 DEBUG : POST /api/m1/folder?access_token=xxx&limit=2147483647&offset=0 HTTP/1.1
Host: cloud.mail.ru
User-Agent: rclone/v1.67.0
Content-Length: 138
Accept: */*
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip

home=%2Fbackups%2Fmedia%2FPhoto%2FCamera_2000-2012%2F2006%2F2006_10_06-%D0%9F%D0%B5%D1%80%D0%B2%D1%8B%D0%B8%CC%86+%D1%81%D0%BD%D0%B5%D0%B3
2024/09/02 12:54:36 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/09/02 12:54:36 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2024/09/02 12:54:36 DEBUG : HTTP RESPONSE (req 0xc00083ea20)
2024/09/02 12:54:36 DEBUG : HTTP/2.0 404 Not Found
Content-Length: 185
Cache-Control: no-store, no-cache, must-revalidate
Content-Type: application/json; charset=utf-8
Date: Mon, 02 Sep 2024 05:54:36 GMT
Pragma: no-cache
Server: nginx
Strict-Transport-Security: max-age=15768000; includeSubDomains; preload
X-Content-Type-Options: nosniff
X-Email: xxx@mail.ru
X-Frame-Options: SAMEORIGIN
X-From: .lightning_k8s
X-Host: cld-front-ext3.q
X-Page-Id: 
X-Req-Id: xxx
X-Server: lightning
X-Timestamp: 1725256476
X-Timing: 0.0525760650634766
X-Ua-Compatible: IE=Edge
X-Upstream-Time: -

{"email":"xxx@mail.ru","body":{"home":{"value":"/backups/media/Photo/Camera_2000-2012/2006/2006_10_06-Первый снег","error":"not_exists"}},"time":1725256476599,"status":404}
2024/09/02 12:54:36 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2024/09/02 12:54:36 DEBUG : Sync Logger: MissingOnDst: + photo1.jpg
2024/09/02 12:54:36 INFO  : [backups/media/Photo/Camera_2000-2012/2006/2006_10_06-Первый снег]: Making map for --track-renames
2024/09/02 12:54:36 INFO  : [backups/media/Photo/Camera_2000-2012/2006/2006_10_06-Первый снег]: Finished making map for --track-renames
2024/09/02 12:54:36 DEBUG : [backups/media/Photo/Camera_2000-2012/2006/2006_10_06-Первый снег]: Waiting for checks to finish
2024/09/02 12:54:36 DEBUG : [backups/media/Photo/Camera_2000-2012/2006/2006_10_06-Первый снег]: Waiting for renames to finish
2024/09/02 12:54:36 DEBUG : photo1.jpg: Need to transfer - No matching file found at Destination
2024/09/02 12:54:36 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/09/02 12:54:36 DEBUG : HTTP REQUEST (req 0xc00092ab40)
2024/09/02 12:54:36 DEBUG : GET /m HTTP/1.1
Host: dispatcher.cloud.mail.ru
User-Agent: rclone/v1.67.0
Accept: */*
Accept-Encoding: gzip

2024/09/02 12:54:36 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/09/02 12:54:36 DEBUG : [backups/media/Photo/Camera_2000-2012/2006/2006_10_06-Первый снег]: Waiting for transfers to finish
2024/09/02 12:54:36 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2024/09/02 12:54:36 DEBUG : HTTP RESPONSE (req 0xc00092ab40)
2024/09/02 12:54:36 DEBUG : HTTP/1.1 200 OK
Content-Length: 63
Cache-Control: no-cache
Connection: keep-alive
Content-Type: text/plain
Date: Mon, 02 Sep 2024 05:54:36 GMT
Expires: Mon, 02 Sep 2024 05:54:35 GMT
Server: nginx/1.20.2
X-Host: cld-dispatcher2
X-Robots-Tag: noindex

https://cld-extapi.datacloudmail.ru/meta/ 176.112.173.62 1000
2024/09/02 12:54:36 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2024/09/02 12:54:36 DEBUG : [backups/media/Photo/Camera_2000-2012/2006/2006_10_06-Первый снег]: new meta server: https://cld-extapi.datacloudmail.ru/meta/
2024/09/02 12:54:36 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/09/02 12:54:36 DEBUG : HTTP REQUEST (req 0xc0006e45a0)
2024/09/02 12:54:36 DEBUG : POST /meta/?client_id=cloud-win&token=xxx HTTP/1.1
Host: cld-extapi.datacloudmail.ru
User-Agent: rclone/v1.67.0
Content-Length: 82
Accept: */*
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip

jN/backups/media/Photo/Camera_2000-2012/2006/2006_10_06-Первый снег
2024/09/02 12:54:36 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/09/02 12:54:36 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/09/02 12:54:36 DEBUG : HTTP REQUEST (req 0xc000946240)
2024/09/02 12:54:36 DEBUG : POST /meta/?client_id=cloud-win&token=xxx HTTP/1.1
Host: cld-extapi.datacloudmail.ru
User-Agent: rclone/v1.67.0
Content-Length: 82
Accept: */*
Content-Type: application/x-www-form-urlencoded
Accept-Encoding: gzip

jN/backups/media/Photo/Camera_2000-2012/2006/2006_10_06-Первый снег
2024/09/02 12:54:36 DEBUG : >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
2024/09/02 12:54:36 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2024/09/02 12:54:36 DEBUG : HTTP RESPONSE (req 0xc0006e45a0)
2024/09/02 12:54:36 DEBUG : HTTP/1.1 200 OK
Content-Length: 1
Connection: keep-alive
Content-Type: text/plain; charset=utf-8
Date: Mon, 02 Sep 2024 05:54:36 GMT
Server: nginx/1.20.2
X-Host: cld-extapi7.q

2024/09/02 12:54:36 DEBUG : <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
2024/09/02 12:54:36 ERROR : photo1.jpg: Failed to copy: invalid characters in object name
2024/09/02 12:54:36 DEBUG : Sync Logger: Error: ! photo1.jpg

How to use GitHub

kapitainsky commented 1 week ago

It is probably part of NFC/NFD saga on macOS.

rclone and most backends do not care if character is normalised as NFC or NFD but possibly mailru does. As a workaround try to normalise all names before either to NFC or NFD and see what works.

convmv -r -f utf8 -t utf8 --nfc --notest /path/to/files
convmv -r -f utf8 -t utf8 --nfd --notest /path/to/files

Try as well rclone flags:

--no-unicode-normalization            Don't normalize unicode characters in filenames
--local-unicode-normalization         Apply unicode NFC normalization to paths and filenames
doubleaxe commented 1 week ago

--no-unicode-normalization and --local-unicode-normalization don't help.

convmv -r -f utf8 -t utf8 --nfc --preserve-mtimes --notest /data/media/Photo

This works, after encoding is fixed - everything works fine. I guess this is because initially files was on HFS+ filesystem, and later it was converted to APFS during MacOS upgrade.

Workaround is found, bug report could be closed now. Thank you for suggestions.

kapitainsky commented 1 week ago

It is something rclone does not take into account today - that some remotes only support specific normalisation. Very rare but it happens. Here mailru and I remember from the forum exactly the same issue with one S3 provider.

I wonder if we should have at least flag forcing specific normalization at remote? Or too small issue to bother and more trouble than it is worth?

@ncw @nielash what do you think?

ncw commented 1 week ago

If we wanted to do this we would need to make a feature flag for the backend, then an integration test to make sure it was set correctly. This would then tell us which backends it would need to be set on and we could then use the feature flag in the core of rclone to force normalisation on.

My first concern would be what is the feature flag testing? That UTF-8 normalisation is required by the backend? Or maybe that the backend forces UTF-8 normalisation because I know some backend do that too.

I suppose some backends might do something more complicated like only require normalisation for Cyrillic.

It is probably quite a big project for perhaps little gain since we don't get many issues about it. @albertony I know you've worked on the normalisation code in the past - any thoughts?