nextcloud / server

☁️ Nextcloud server, a safe home for all your data
https://nextcloud.com
GNU Affero General Public License v3.0
27.32k stars 4.06k forks source link

Special characters in filenames #7762

Open loeffelpan opened 6 years ago

loeffelpan commented 6 years ago

Steps to reproduce

  1. Put some files (not all with those characters) in any folder
  2. Include this folder by external storage (local)

Expected behaviour

Every file in this folder shoud be scanned and shown in the files-app.

Actual behaviour

These files came through download on the harddisk of my homeserver. The folder containing the downloaded files are configured as “local” external storage in my nextcloud. Files and folders with german “umlaute” created by nextcloud in the files-app appear in the file listings. Other files and folders (from download) are ignored by the occ-file-scan.

While file-scan in debug mode the following messages appear in nextcloud.log. There have to be Lügen instead of L\u00fcgen and Hölle instead of H\u00f6lle for example.

{“reqId”:“X7LIb2Ci8jdOOkqp3leZ”,“level”:0,“time”:“2017-12-26T17:25:29+01:00”,“remoteAddr”:"",“user”:"–",“app”:“OC\Files\Cache\Scanner”,“method”:"–",“url”:"–",“message”:"!!! Path ‘Serien/Zoo/S02E06.Sex, L\u00fcgen und Quallen.mp4’ is not accessible or present !!!",“userAgent”:"–",“version”:“12.0.4.3”}
{“reqId”:“X7LIb2Ci8jdOOkqp3leZ”,“level”:0,“time”:“2017-12-26T17:25:29+01:00”,“remoteAddr”:"",“user”:"–",“app”:“OC\Files\Cache\Scanner”,“method”:"–",“url”:"–",“message”:"!!! Path ‘Serien/Zoo/S02E10.H\u00f6lle in Helsinki.mp4’ is not accessible or present !!!",“userAgent”:"–",“version”:“12.0.4.3”}

Server configuration

Operating system: Ubuntu Server 17.10

Web server: Apache 2.4.27

Database: MySQL

PHP version: PHP 7.1.11-0ubuntu0.17.10.1

Nextcloud version: 12.0.4

Updated from an older Nextcloud/ownCloud or fresh install: fresh install

Where did you install Nextcloud from: nextcloud.com

List of activated apps:

App list ``` Enabled: - dav: 1.3.0 - federatedfilesharing: 1.2.0 - files: 1.7.2 - files_external: 1.3.0 - files_sharing: 1.4.0 - files_videoplayer: 1.1.0 - lookup_server_connector: 1.0.0 - notifications: 2.0.0 - oauth2: 1.0.5 - provisioning_api: 1.2.0 - theming: 1.3.0 - twofactor_backupcodes: 1.1.1 - updatenotification: 1.2.0 - workflowengine: 1.2.0 Disabled: - activity - admin_audit - comments - encryption - federation - files_pdfviewer - files_texteditor - files_trashbin - files_versions - firstrunwizard - gallery - logreader - nextcloud_announcements - password_policy - serverinfo - sharebymail - survey_client - systemtags - user_external - user_ldap ```

Nextcloud configuration:

Config report ``` { "system": { "instanceid": "oc65jgv8zf6o", "passwordsalt": "***REMOVED SENSITIVE VALUE***", "secret": "***REMOVED SENSITIVE VALUE***", "trusted_domains": [ "toothless.goip.de", "toothless.fritz.box" ], "datadirectory": "\/var\/www\/nextcloud\/data", "overwrite.cli.url": "https:\/\/toothless.goip.de", "dbtype": "mysql", "version": "12.0.4.3", "dbname": "nextcloud", "dbhost": "localhost", "dbport": "", "dbtableprefix": "oc_", "mysql.utf8mb4": true, "dbuser": "***REMOVED SENSITIVE VALUE***", "dbpassword": "***REMOVED SENSITIVE VALUE***", "installed": true, "skeletondirectory": "", "logtimezone": "Europe\/Berlin", "memcache.local": "\\OC\\Memcache\\APCu", "memcache.locking": "\\OC\\Memcache\\Redis", "redis": { "host": "localhost", "port": "6379" }, "htaccess.RewriteBase": "\/", "mail_smtpmode": "smtp", "mail_smtpauthtype": "LOGIN", "mail_smtpauth": 1, "mail_from_address": "jan.noormann", "mail_domain": "gmail.com", "mail_smtphost": "smtp.gmail.com", "mail_smtpport": "587", "mail_smtpname": "***REMOVED SENSITIVE VALUE***", "mail_smtppassword": "***REMOVED SENSITIVE VALUE***", "mail_smtpsecure": "tls" } } ```

Are you using external storage, if yes which one: local

Are you using encryption: no

Are you using an external user-backend, if yes which one: no

Client configuration

Browser: Opera, Chrome, Firefox

Operating system: Windows 10

rullzer commented 6 years ago

@icewind1991 fs fun :)

icewind1991 commented 6 years ago

Are you perhaps using a non stand filesystems such as fat or ntfs?

Can you try creating a php file ls.php:

<?php
echo "Listing {$argv[1]}\n";
var_dump(scandir($argv[1]));

And run it using php ls.php /path/to/folder and see if you get the correct result

loeffelpan commented 6 years ago

Filesystem ist ext4 on that hdd. I just figured out, that there are other files with special characters in the same filesystem, which are listed by nextcloud's file-app. Seems to have something to do with exactly the mentioned files.

The result of your PHP looks fine: Listing /mnt/Test array(6) { [0]=> string(1) "." [1]=> string(2) ".." [2]=> string(35) "S02E06.Sex, Lügen und Quallen.mp4" [3]=> string(30) "S02E09.Das Knochenrätsel.mp4" [4]=> string(30) "S02E10.Hölle in Helsinki.mp4" [5]=> string(31) "S02E12.Die Säbelzahnkatze.mp4" }

teadur commented 5 years ago

I'm having similar issue on ext4 filesystem. For most of the files everything is okey but there is some amount of files with umlauts in their name that cannot be accessed by the File Scanner.

All affected files have error: "OC\Files\Cache\Scanner","method":"--","url":"--","message": !!! Path 'ROOT\/K\u00c4SKI\/DIR\/T\u00f6\u00f6teeb.pdf' is not accessible or present !!!","userAgent":"--","version":"13.0.2.1"} It seems these files have non utf-8 filenames, for example iso-8859-*

It seems that the scanner expects all filenames to be in ascii or utf-8.

If i take one of the non working files from filesystem and upload it from web ui it's accessible (it seems something converts the filename enconding in that case).

teadur commented 5 years ago

if someone hits this problem and needs solution faster then the code gets fixed, then one solution is to use rclone / rsync to modify the filename charset.

fhoner commented 5 years ago

Facing exactly the same problem. Any updates on this?

OS: Ubuntu Server 18.04 Webserver: Apache 2.4.37 Database: PostgreSQL PHP version: 7.2.13-1+ubuntu18.04.1+deb.sury.org+1 Nextcloud version: 15.0.0 Filesystem of local storage added to NC: ext4

Mr-Bart-Simpson commented 5 years ago

Just stumpled accross a very similar issue: Filenames containing a Plus-sign (+) cannot be uploaded - neither via Webfrontend nor via (Windows-) Client-Application.

fhoner commented 5 years ago

Still present in v15.0.2

kesselb commented 5 years ago

I don't know how to reproduce :disappointed:

peek 2019-01-16 15-06

Mr-Bart-Simpson commented 5 years ago

Is it possible that the problem depends on the underlying OSes? I had the problem with the Plus-Sign when uploading a file from a Windows 10 client to a Nextcloud server hosted on Linux Mint

fhoner commented 5 years ago

For me it has something to do with filename encodings I guess. Following scenario:

I have a separated hard drive installed on the server where Nextcloud runs on. This drive is mounted as external storage with type local (ext4). Some people do have access to this drive via ssh/sftp. Folders copied over sftp on this drive containing symbols like ä, ö, ü are not shown on Nextcloud webclient. Renaming these folders manually using ssh terminal makes them visible though. As there are terabytes of data manually renaming is not an option. I will do some further investigation and let you know any news.

timor commented 5 years ago

cc @herrwiese

loeffelpan commented 5 years ago

I faced this again and again. I will try renaming to solve this. For now uploading the files via web and deleting the invisible ones is my workaround.

carowsolutions commented 5 years ago

I put a cronjob in place to rename files containing Umlaute: /30 find /etc/data/ -name "[äöüÄÖÜß]*" -exec rename 's/ä/ae/g;s/ü/ue/g;s/ß/ss/g;s/Ä/Ae/g;s/Ü/Ue/g;s/Ö/Oe/g;s/ö/oe/g' {} \;

daftmab commented 5 years ago

Solution:

I take no responsibility! create a database backup!!

Open PHPmyAdmin set Charset to ASCII and convert all tables. set charset back to utf-8 and convert all tables again. empty all file tables: oc_activity, oc_filecache, oc_files_trash. DELETE FROM oc_filecache rescan all files with php -d memory_limit=1024M /var/www/cloud.nextloud.de/occ files:scan --all I worked only on the database. Not the filesystem. Worked for me. Umlaute in oc_accounts and other tables like groups must be changed manually.

/edit just deleting the file tables and running the occ command doesn't work. The Umlaute are still raw utf-8 ä ö ü or \u00c4 \u00d6 \u00dc

schwma commented 5 years ago

I am experiencing a similar issue where some file paths containing special characters (specifically German umlauts) are not showing up. The folders in question are mounted as external storage via SFTP. I am running Nextcloud 16.0.3 as a docker container on Ubuntu Server 18.04.

What confused me was that some file paths containing umlauts were showing up while others were not. After poking around a bit I discovered that the paths that were not showing up contained "A", "O", or "U" followed by the unicode character "COMBINING DIAERESIS" (0x0308) whereas file paths that showed up normally seemed to contain "Ä", "Ö", or "Ü" directly. When renaming the combining diaeresis to the respective umlaut, the file path shows up as expected.

OpenCoreCH commented 4 years ago

@schwma (and potentially others): I had the same issue (files with "COMBINING DIAERESIS" not showing up) and could resolve it by enabling the "NFD compatibility" option on the share. The problem is that Nextcloud normalizes unicode by default (see https://github.com/nextcloud/server/blob/21119633041d5ccae19975a58b0ae50ef5a8e33a/lib/private/Files/Filesystem.php#L821-L823) and turns names like "Lo\xcc\x88sungen.pdf" into "L\xc3\xb6sungen.pdf" which then are not found on the external share (because they don't exist). Enabling the option checks both encodings for such files. See https://github.com/owncloud/core/issues/21365 and https://github.com/owncloud/core/pull/24349 for an extensive discussion of the issue.

endrift commented 4 years ago

I have this problem and arrived at the conclusion that the issue involved Unicode normalization too; however, I'm running on ZFS and none of the Unicode normalization options on my filesystem seemed to resolve the issue, so I've resorted to...not storing files with non-ASCII filenames in Nextcloud :(

n3storm commented 4 years ago

All of my MacosX users from different unrelated organizations fail to see files and folders containing "combining tildes" symbols.

Looks like PHP is able to handle this since PHP 7: https://wiki.php.net/rfc/unicode_escape

As per this page https://www.php.net/normalizer normalizing to NFC (being MacosX file and directory filenames NFD normalized) should fix this.

What worked to us to solve this issue is running frecuently cron tasks using following commands:

The star here is convmv command and following SO question gave us the final touch:

https://stackoverflow.com/questions/26516700/file-name-look-the-same-but-is-different-after-copying

Looking now to use something like triggers to make de conversion, but we think this is issue shoud be addressed by Nextcloud.

n3storm commented 4 years ago

We are testing now using Nextcloud module Workflow making all Created and Copied files with mime type not application/fuu (to make all files and folders pass through) to this script: /usr/bin/convmv --notest --nfc -f utf8 -t utf8 -r %f

Here we are using spanish characters from MacosX keyboards. If somebody else can make test that would be awesome.

masterleo commented 4 years ago

Hi, amazing that this issue is still open considering the importance. I just added this two special characters on mac, thinking it would "look nice" :sunglasses: : small smalll diam

And then all my files where deleted on all my machine (by witch app / OS ? I don't know. ) Screenshot from 2020-09-05 23-45-26

And then it is now impossible for me to restore them. Maybe because I configured the server to not store the files I delete. I need to check this tomorow... I am so sad I lost evrithing because of a simple ascii bug. .. :-1: Screenshot from 2020-09-05 23-46-09

Screenshot from 2020-09-05 23-45-40 Screenshot from 2020-09-05 23-45-50

n3storm commented 4 years ago

They are not deleted, is just nextcloud cannot see them. Access your file server directly (ssh).

I am not Netxcloud expert but I think emoji support is a different issue, check you enabled utf8 support in you bbdd. https://docs.nextcloud.com/server/18/admin_manual/configuration_database/mysql_4byte_support.html

skjnldsv commented 4 years ago

@masterleo can you confirm this solves your issue?

I am not Netxcloud expert but I think emoji support is a different issue, check you enabled utf8 support in you bbdd. docs.nextcloud.com/server/18/admin_manual/configuration_database/mysql_4byte_support.html

n3storm commented 4 years ago

@skjnldsv why needs info label has been added?

I hope masterleo did not hijack this issue with the emoji issue.

You can ask me any information about unicode NFC / NFD and will do my best to provide you with such information in order to fix this issue.

skjnldsv commented 4 years ago

@skjnldsv why needs info label has been added?

I hope masterleo did not hijack this issue with the emoji issue.

Because I read too fast ;)

What is currently missing here? The issue still states "needs triage", meaning it's not confirmed. Is it an issue with Nextcloud? With the file system?

n3storm commented 4 years ago

@skjnldsv why needs info label has been added? I hope masterleo did not hijack this issue with the emoji issue.

Because I read too fast ;)

What is currently missing here? The issue still states "needs triage", meaning it's not confirmed. Is it an issue with Nextcloud? With the file system?

Is an issue with the unicode set Netxcloud supports in method OC_Util::normalizeUnicode() as @OpenCoreCH points out on comment on 22 Nov 2019

endrift commented 4 years ago

Yeah, the issue is that the normalized version of the Unicode that nextcloud expects for a given path does not match the version on the filesystem, so when it attempts to find it, it doesn't exist. In theory it should be looking for close matches and then normalizing them the same way, since you can't guarantee how any given filesystem does normalization (if it even does any in the first place). That's somewhat problematic though because you have to do directory searches instead of direct name lookups.

benjelloun69 commented 3 years ago

Solution that worked for me :

open “/lib/private/legacy/OC_Util.php” and change line 1367 :

public static function normalizeUnicode($value) {

if (Normalizer::isNormalized($value)) {
....
}

BY :

public static function normalizeUnicode($value) {

return mb_convert_encoding($value,"UTF-8");

if (Normalizer::isNormalized($value)) {
....
}
wiswedel commented 3 years ago

Solution that worked for me :

open “/lib/private/legacy/OC_Util.php” and change line 1367 :

public static function normalizeUnicode($value) {

if (Normalizer::isNormalized($value)) {
....
}

BY :

public static function normalizeUnicode($value) {

return mb_convert_encoding($value,"UTF-8");

if (Normalizer::isNormalized($value)) {
....
}

@benjelloun69 Would you mind opening a Pull Request with that solution approach so it can be properly tested and if applicable get merged right away?

hugleo commented 3 years ago

Same problem if you have "\n" (new line) in filenames what is accepted by Linux.

Test aaaa .txt

dodancs commented 3 years ago

Hi there. I face similar issues on my setup: Ubuntu 18.04 LTS Filesystem: ext4 Nextcloud 19.0.1

The filename in question contains \u000f unicode character which messes things up. I get ERROR 400: "File name contains at least one invalid character".

logs:

{
  "reqId": "XXXX",
  "level": 4,
  "time": "2021-01-26T12:03:51+00:00",
  "remoteAddr": "XXXX",
  "user": "XXXX",
  "app": "webdav",
  "method": "PUT",
  "url": "/remote.php/dav/files/XXXX/test/%20%0Ftest.txt",
  "message": {
    "Exception": "OCA\\DAV\\Connector\\Sabre\\Exception\\InvalidPath",
    "Message": "File name contains at least one invalid character",
    "Code": 0,
    "Trace": [
      {
        "file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Tree.php",
        "line": 80,
        "function": "getChild",
        "class": "OCA\\DAV\\Connector\\Sabre\\Directory",
        "type": "->",
        "args": [" \u000ftest.txt"]
      },
      {
        "file": "/var/www/html/apps/dav/lib/Connector/Sabre/LockPlugin.php",
        "line": 68,
        "function": "getNodeForPath",
        "class": "Sabre\\DAV\\Tree",
        "type": "->",
        "args": [
          "files/XXXX/test/ \u000ftest.txt"
        ]
      },
      {
        "file": "/var/www/html/3rdparty/sabre/event/lib/WildcardEmitterTrait.php",
        "line": 89,
        "function": "getLock",
        "class": "OCA\\DAV\\Connector\\Sabre\\LockPlugin",
        "type": "->",
        "args": [
          { "__class__": "Sabre\\HTTP\\Request" },
          { "__class__": "Sabre\\HTTP\\Response" }
        ]
      },
      {
        "file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Server.php",
        "line": 458,
        "function": "emit",
        "class": "Sabre\\DAV\\Server",
        "type": "->",
        "args": [
          "beforeMethod:PUT",
          [
            { "__class__": "Sabre\\HTTP\\Request" },
            { "__class__": "Sabre\\HTTP\\Response" }
          ]
        ]
      },
      {
        "file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Server.php",
        "line": 251,
        "function": "invokeMethod",
        "class": "Sabre\\DAV\\Server",
        "type": "->",
        "args": [
          { "__class__": "Sabre\\HTTP\\Request" },
          { "__class__": "Sabre\\HTTP\\Response" }
        ]
      },
      {
        "file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Server.php",
        "line": 319,
        "function": "start",
        "class": "Sabre\\DAV\\Server",
        "type": "->",
        "args": []
      },
      {
        "file": "/var/www/html/apps/dav/lib/Server.php",
        "line": 320,
        "function": "exec",
        "class": "Sabre\\DAV\\Server",
        "type": "->",
        "args": []
      },
      {
        "file": "/var/www/html/apps/dav/appinfo/v2/remote.php",
        "line": 35,
        "function": "exec",
        "class": "OCA\\DAV\\Server",
        "type": "->",
        "args": []
      },
      {
        "file": "/var/www/html/remote.php",
        "line": 167,
        "args": ["/var/www/html/apps/dav/appinfo/v2/remote.php"],
        "function": "require_once"
      }
    ],
    "File": "/var/www/html/apps/dav/lib/Connector/Sabre/Directory.php",
    "Line": 225,
    "CustomMessage": "--"
  },
  "userAgent": "Mozilla/5.0 (Macintosh) mirall/3.1.1git (build 4316) (Nextcloud)",
  "version": "19.0.1.1"
}
blackerking commented 3 years ago

On my Setup it destroys the file-data with libreoffice: grafik

Ubuntu 18,04LTS btrfs PHP Version: 7.4.14 Nextcloud: 20.0.6

hschletz commented 3 years ago

One possible cause may be an Apache setup with mod_php and mod_perl simultaneously enabled. This is known to break PHP's setlocale() function – it reports success, but has no effect. Nextcloud actually checks for a working setlocale(), but for some reason the message did not show up for me until I performed an upgrade today.

dodancs commented 3 years ago

@hschletz I use NginX with php-fpm and still get this error.

hschletz commented 3 years ago

@hschletz I use NginX with php-fpm and still get this error.

That's why I wrote "One possible cause". So far it appears that there are different causes in the server environment (OS, filesystem, locales, webserver) and that Nextcloud's normalization code itself is correct. It's worth listing these possible causes (and their solutions), but not everybody will have the same cause.

Nextcloud has some checks to ensure that those outside factors don't break stuff, but those are not run on every request.

BTW, the workaround in https://github.com/nextcloud/server/issues/7762#issuecomment-732059466 did not work for me.

Here's the code that checks for working setlocale(): https://github.com/nextcloud/server/blob/77083da332b032dc727e1da2f170892932802b64/lib/private/legacy/util.php#L1279-L1285 https://github.com/tchwork/utf8/blob/e1fa4d4a57896d074c9a8d01742b688d5db4e9d5/src/Patchwork/Utf8/Bootup.php#L124-L134

You could put that code in a test script in your nextcloud base directory and invoke it via HTTP to see whether setlocale() is actually working for your setup. If not, running the script via CLI may or may not give a different result, which tells you whether the problem is your webserver setup or something else on the system. If setlocale() works on your webserver, your problem is something else.

gdesor commented 3 years ago

Solution that worked for me :

open “/lib/private/legacy/OC_Util.php” and change line 1367 :

public static function normalizeUnicode($value) {

if (Normalizer::isNormalized($value)) {
....
}

BY :

public static function normalizeUnicode($value) {

return mb_convert_encoding($value,"UTF-8");

if (Normalizer::isNormalized($value)) {
....
}

It works for me too . Thanks !

PVince81 commented 3 years ago

if the file is using NFD encoding instead of NFC, you should enable the "NFD compatibility mode" in the mount options: https://docs.nextcloud.com/server/stable/admin_manual/configuration_files/external_storage_configuration_gui.html#mount-options

however, I noticed in the code that specifically for the "Local" external storage this is not working because we check isLocal when actually we want to check if it's the home storage: https://github.com/nextcloud/server/blob/master/lib/private/legacy/OC_Util.php#L244

johndoe7000 commented 2 years ago

The patch from @benjelloun69 still works like a charm with NC 22.x.x and 23.x.x when using external storage. It does not only fix german umlauts but also french accents. "occ files:scan --all" is 100% happy with it but not without it. COOL is also happy and opens every Microsoft/Libreoffice file with an umlaut in its name which failed before.

Why is that patch still not merged? What's wrong with it?

endrift commented 2 years ago

It discards the entirety of the function and replaces it with one specific case.

TristisOris commented 2 years ago

version 24, bug still exist. problems with cyrillic "Й" not with all files. probably with some old from winXP-7 with some weird encoding.

sudo -u www-data php -f occ files:scan --path "/***
        Entry "***.doc" will not be accessible due to incompatible encoding

this solution works for me. any problems to implement? https://help.nextcloud.com/t/invalid-encoding-on-file-names-in-nc19/83835/2

k-popov commented 2 years ago

NextCloud 24.0.1.1 External storage (S3 in Yandex.Cloud) admin-configured (not user-configured), "NFD compatibility" checkbox set. Filename (actually object name in S3) contains cyrillic "й". Files scan (occ files:scan --path=) completes fine, no warnings, the problematic file is listed too.

It is possible to "Move or Copy" the file to primary storage but if "Download" is clicked, Nextcloud responds with 503 and in the log file I see:

{"Exception":"Error","Message":"fopen(https://storage.yandexcloud.net/<URL_ENCODED_PATH_TO_OBJECT_HERE>): Failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found\r\n at /var/www/html/lib/private/Files/ObjectStore/S3ObjectTrait.php#79"

Checking the above mentioned url-encoded URL shows that NextCloud attempts to get object with NFC (%D0%B9% for й) while listing objects with python + boto3 shows the object has name with NFD (%D0%B8%CC%86 for й).

So it looks like some encoding compatibility mechanism works when doing copying to primary storage and doesn't work when downloading (or opening the pdf file in browser).

szaimen commented 1 year ago

Hi, please update to 24.0.8 or better 25.0.2 and report back if it fixes the issue. Thank you!

johndoe7000 commented 1 year ago

I'm currently using 24.0.8.

  1. I can confirm, that files with german umlauts can be scanned with "occ files:scan USER" and opened in Colabora Online Office.

  2. But they cannot be scanned with "occ files:scan USER" or opened with Colabora Online Office when they are password protected.

Adding "return mb_convert_encoding($value,"UTF-8");" into OC-Util.php makes 1. and 2. work flawlessly.

szaimen commented 1 year ago

Oh, interesting find! Do you mind creating a PR with your patch here? https://github.com/nextcloud/server/edit/master/lib/private/legacy/OC_Util.php Thanks a lot! :)

johndoe7000 commented 1 year ago

@szaimen It was not me finding the solution... it was from another guy @benjelloun69 . Look here...

https://help.nextcloud.com/t/invalid-encoding-on-file-names-in-nc19/83835

He posted a solution on 1. Nov. 2020... but somehow it was not accepted... read my comment from 26. Apr. 2022 until here. I would be very happy if his patch would be accepted or a different one which satisfy Nextcloud devs.

Adding this small line on every Nextcloud release since more than 2 years is really annoying:(

szaimen commented 1 year ago

I guess we have not seen this form post. Can you try to create the PR? I'll then help you moving this forward :)

PVince81 commented 1 year ago

see also troubleshooting NFD encoding issues with external storage: https://docs.nextcloud.com/server/latest/admin_manual/issues/general_troubleshooting.html#troubleshooting-file-encoding-on-external-storages

I'm not sure if the proposed patch will make everything work correctly. Maybe the scanner will find the file but when you'll try to overwrite it through the web UI or Webdav, it will create another instance of the file with the NFC normalized name. So you'll see two files on disk with seemingly the same name, but one is with NFC normalized and one with NFD (the original one).

For external storages, a special compatibility mode has been developed (see link above) which will always try both encodings to avoid such issues. However this approach makes everything slower as more FS accesses are required.

PVince81 commented 1 year ago

for those already using compatibility mode and can confirm that they have NFD encoded file names and it still doesn't work, then it can be handled as bug. Back then this mode was mostly tested with SMB storages and maybe some other storages like S3 need further workarounds to work correctly.

johndoe7000 commented 1 year ago

@PVince81 Now this gets interesting.... I work for an employer where we use Windows, Linux and MacOSX.

I have an account on our Nextcloud 24.0.8 where I use a Samba4 (2:4.9.5+dfsg-5+deb10u3, Debian Buster) DFS enabled share. There I have an excel sheet with german umlauts and Space in its name and it's password protected. When I don't use the patch from @benjelloun69 I cannot successfully scan this file with occ or open it with Colabora Online Office (COOL). When I try to open it with COOL I see only a spinning wheel from Nextcloud. When I enable NFD for this share COOL opens but fails to give me the dialog to enter the password.

Next, I created a new excel sheet with MS Office 2013 on my Windows System with a german umlaut and a space in its name and protected it with a password. Then logged into Nextcloud and this file can be scanned and opened with COOL without problems... the password dialog appears.

So my guess, that it is a problem with password protected files which have an umlaut in its name IS WRONG, sorry for inconvenience.

Summary... fact is...

  1. I can open the "mysterious" excel sheet without problems from Windows and Linux (I have no access to a MacOSX machine).
  2. I can open and scan this file in Nextcloud with the patch from @benjelloun69 without having NFD enabled on the share.
  3. Enabling NFD on the share and not patching OC_Util.php works half way for this special excel sheet.
  4. I cannot tell you, if this file was originally generated on a Mac or not.

And before you ask, this excel sheet has "very" sensitive data in it, so I cannot share.

When I have more time... I will try to remove the password from that excel sheet and test again. If not possible maybe changing the password from my Windows or Linux system helps. This is of course not a solution to the problem, but may give more insight.

PVince81 commented 1 year ago

in case it's useful, you can copy-paste a file name and pass it to this script and it will tell you what normalization it has and also show you both conversions:

<?php
$s = $argv[1];

if (\Normalizer::isNormalized($s, \Normalizer::FORM_D)) {
    print("Original string is using NFD normalization\n");
    $nfc = \Normalizer::normalize($s, \Normalizer::FORM_C);
    print("NFC: $nfc\n");
    print("NFD: $s\n");
} elseif (\Normalizer::isNormalized($s, \Normalizer::FORM_C)) {
    print("Original string is using NFC normalization\n");
    $nfd = \Normalizer::normalize($s, \Normalizer::FORM_D);
    print("NFC: $s\n");
    print("NFD: $nfd\n");
} else {
    print("Unknown normalization\n");
}