nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
27.33k stars 4.06k forks source link

Allow combined UTF-8 characters for filenames on Nextcloud External local storage files #40546

Open cue108 opened 1 year ago

cue108 commented 1 year ago

On Nextcloud v 27.1.0 I have an external storage of type local. Having file names with utf-8 character encoding and combined character with combining diaeresis fails to be shown.

The error is shown like the following:

php occ files:scan -p /myPath/myExternalStorageFolder
Starting scan for user 1 out of 1 (me)
        Entry "klaus/Hans hätte besser aufgepasst.mp4" will not be accessible due to incompatible encoding
.
.

For example the letter 'ä' can be represented in two utf-8 complaint forms:

Using the latter method for filenames results in an error due to "wrong" character encoding, even though it is valid UTF-8 character encoding.

Consider a scenario where you have a Spanish keyboard as a German tourist in Mallorca. You may produce a combined UTF-8 character for the letter 'ä' using such a keyboard to name your holiday videos and store them on your memory stick right there.

This report highlights the improvement request.

a workaround:

A tool called 'convmv' is available for Linux and can be installed with the following command: sudo apt install convmv

To correct filenames on your holiday memory stick, following my example, use the following command for a dry run:

convmv -r -f utf-8 -t utf-8 --nfc  ./myPhantisticHolydayMemories/*

This command will show what files would be renamed without actually renaming them. After inspecting the actions it would take, you can perform the renaming with the following command:

convmv -r -f utf-8 -t utf-8 --nfc --notest ./myPhantisticHolydayMemories/*

Once you have completed this process, trigger a scan again using:

php occ files:scan -p /myPath/myExternalStorageFolder

You should no longer encounter any "incompatible" encoding errors.

mellow2012 commented 1 year ago

we have the same problem with our SMB storage

Is there a workaround for each new file added to SMB, with this issue?

cue108 commented 1 year ago

we have the same problem with our SMB storage

Is there a workaround for each new file added to SMB, with this issue?

Maybe you play around with Administration->Flow Run script when file created, File renamed from within Nextcloud?

Or you do with Linux tools and observe a folder and trigger a script on file creation and renaming:

sudo apt-get install inotify-tools
#!/bin/bash

inotifywait -m --format '%e %f' ~/test/ |
while read -r event file; do
        echo "The event was $event"
        echo "The file was $file"
done

In the while loop you can handle both the event and the file according to your needs!

https://github.com/inotify-tools/inotify-tools/wiki#info

jdannenberg commented 2 months ago

Thank you @cue108 for this issue, helped me a lot.

However, is this really limited to external local storage?

What I did was:

  1. Adding type local external storage with such filenames
  2. Moving files from this external storage to nextcloud data directory / normal folder via WebUI (as part of migration process; I've been thinking that it is a more compatible way of migrating data to nextcloud by adding external storage and then moving files via WebUI instead of copying / moving directly to nextcloud data directory).
  3. Removing said external storage
  4. Noticing some files are missing --> realizing combined UTF-8 characters äöü files are missing
jdannenberg commented 2 months ago

I just stumbled upon: Isn't this the fix for this issue?:

image

cheneraie commented 1 month ago

@jdannenberg, absolutely : it's working and seems not to be so slow as mentioned.