Open loeffelpan opened 6 years ago
@icewind1991 fs fun :)
Are you perhaps using a non stand filesystems such as fat or ntfs?
Can you try creating a php file ls.php
:
<?php
echo "Listing {$argv[1]}\n";
var_dump(scandir($argv[1]));
And run it using php ls.php /path/to/folder
and see if you get the correct result
Filesystem ist ext4 on that hdd. I just figured out, that there are other files with special characters in the same filesystem, which are listed by nextcloud's file-app. Seems to have something to do with exactly the mentioned files.
The result of your PHP looks fine:
Listing /mnt/Test array(6) { [0]=> string(1) "." [1]=> string(2) ".." [2]=> string(35) "S02E06.Sex, Lügen und Quallen.mp4" [3]=> string(30) "S02E09.Das Knochenrätsel.mp4" [4]=> string(30) "S02E10.Hölle in Helsinki.mp4" [5]=> string(31) "S02E12.Die Säbelzahnkatze.mp4" }
I'm having similar issue on ext4 filesystem. For most of the files everything is okey but there is some amount of files with umlauts in their name that cannot be accessed by the File Scanner.
All affected files have error: "OC\Files\Cache\Scanner","method":"--","url":"--","message": !!! Path 'ROOT\/K\u00c4SKI\/DIR\/T\u00f6\u00f6teeb.pdf' is not accessible or present !!!","userAgent":"--","version":"13.0.2.1"} It seems these files have non utf-8 filenames, for example iso-8859-*
It seems that the scanner expects all filenames to be in ascii or utf-8.
If i take one of the non working files from filesystem and upload it from web ui it's accessible (it seems something converts the filename enconding in that case).
if someone hits this problem and needs solution faster then the code gets fixed, then one solution is to use rclone / rsync to modify the filename charset.
Facing exactly the same problem. Any updates on this?
OS: Ubuntu Server 18.04 Webserver: Apache 2.4.37 Database: PostgreSQL PHP version: 7.2.13-1+ubuntu18.04.1+deb.sury.org+1 Nextcloud version: 15.0.0 Filesystem of local storage added to NC: ext4
Just stumpled accross a very similar issue: Filenames containing a Plus-sign (+) cannot be uploaded - neither via Webfrontend nor via (Windows-) Client-Application.
Still present in v15.0.2
I don't know how to reproduce :disappointed:
Is it possible that the problem depends on the underlying OSes? I had the problem with the Plus-Sign when uploading a file from a Windows 10 client to a Nextcloud server hosted on Linux Mint
For me it has something to do with filename encodings I guess. Following scenario:
I have a separated hard drive installed on the server where Nextcloud runs on. This drive is mounted as external storage with type local (ext4). Some people do have access to this drive via ssh/sftp. Folders copied over sftp on this drive containing symbols like ä, ö, ü are not shown on Nextcloud webclient. Renaming these folders manually using ssh terminal makes them visible though. As there are terabytes of data manually renaming is not an option. I will do some further investigation and let you know any news.
cc @herrwiese
I faced this again and again. I will try renaming to solve this. For now uploading the files via web and deleting the invisible ones is my workaround.
I put a cronjob in place to rename files containing Umlaute: /30 find /etc/data/ -name "[äöüÄÖÜß]*" -exec rename 's/ä/ae/g;s/ü/ue/g;s/ß/ss/g;s/Ä/Ae/g;s/Ü/Ue/g;s/Ö/Oe/g;s/ö/oe/g' {} \;
Solution:
Open PHPmyAdmin set Charset to ASCII and convert all tables.
set charset back to utf-8 and convert all tables again.
empty all file tables: oc_activity, oc_filecache, oc_files_trash.
DELETE FROM oc_filecache
rescan all files with
php -d memory_limit=1024M /var/www/cloud.nextloud.de/occ files:scan --all
I worked only on the database. Not the filesystem. Worked for me.
Umlaute in oc_accounts and other tables like groups must be changed manually.
/edit
just deleting the file tables and running the occ command doesn't work.
The Umlaute are still raw utf-8 ä ö ü
or \u00c4 \u00d6 \u00dc
I am experiencing a similar issue where some file paths containing special characters (specifically German umlauts) are not showing up. The folders in question are mounted as external storage via SFTP. I am running Nextcloud 16.0.3 as a docker container on Ubuntu Server 18.04.
What confused me was that some file paths containing umlauts were showing up while others were not. After poking around a bit I discovered that the paths that were not showing up contained "A", "O", or "U" followed by the unicode character "COMBINING DIAERESIS" (0x0308) whereas file paths that showed up normally seemed to contain "Ä", "Ö", or "Ü" directly. When renaming the combining diaeresis to the respective umlaut, the file path shows up as expected.
@schwma (and potentially others): I had the same issue (files with "COMBINING DIAERESIS" not showing up) and could resolve it by enabling the "NFD compatibility" option on the share. The problem is that Nextcloud normalizes unicode by default (see https://github.com/nextcloud/server/blob/21119633041d5ccae19975a58b0ae50ef5a8e33a/lib/private/Files/Filesystem.php#L821-L823) and turns names like "Lo\xcc\x88sungen.pdf" into "L\xc3\xb6sungen.pdf" which then are not found on the external share (because they don't exist). Enabling the option checks both encodings for such files. See https://github.com/owncloud/core/issues/21365 and https://github.com/owncloud/core/pull/24349 for an extensive discussion of the issue.
I have this problem and arrived at the conclusion that the issue involved Unicode normalization too; however, I'm running on ZFS and none of the Unicode normalization options on my filesystem seemed to resolve the issue, so I've resorted to...not storing files with non-ASCII filenames in Nextcloud :(
All of my MacosX users from different unrelated organizations fail to see files and folders containing "combining tildes" symbols.
Looks like PHP is able to handle this since PHP 7: https://wiki.php.net/rfc/unicode_escape
As per this page https://www.php.net/normalizer normalizing to NFC (being MacosX file and directory filenames NFD normalized) should fix this.
What worked to us to solve this issue is running frecuently cron tasks using following commands:
sudo -u www-data /usr/bin/convmv --notest --nfc -f utf8 -t utf8 -r data/
(better use absolute paths)sudo -u www-data /usr/bin/php occ files:scan --all
sudo -u www-data /usr/bin/php occ groupfolders:scan 1
Optional (you may have more than one group folder which is a hassle)The star here is convmv command and following SO question gave us the final touch:
https://stackoverflow.com/questions/26516700/file-name-look-the-same-but-is-different-after-copying
Looking now to use something like triggers to make de conversion, but we think this is issue shoud be addressed by Nextcloud.
We are testing now using Nextcloud module Workflow making all Created and Copied files with mime type not application/fuu (to make all files and folders pass through) to this script:
/usr/bin/convmv --notest --nfc -f utf8 -t utf8 -r %f
Here we are using spanish characters from MacosX keyboards. If somebody else can make test that would be awesome.
Hi, amazing that this issue is still open considering the importance. I just added this two special characters on mac, thinking it would "look nice" :sunglasses: :
And then all my files where deleted on all my machine (by witch app / OS ? I don't know. )
And then it is now impossible for me to restore them. Maybe because I configured the server to not store the files I delete. I need to check this tomorow... I am so sad I lost evrithing because of a simple ascii bug. .. :-1:
They are not deleted, is just nextcloud cannot see them. Access your file server directly (ssh).
I am not Netxcloud expert but I think emoji support is a different issue, check you enabled utf8 support in you bbdd. https://docs.nextcloud.com/server/18/admin_manual/configuration_database/mysql_4byte_support.html
@masterleo can you confirm this solves your issue?
I am not Netxcloud expert but I think emoji support is a different issue, check you enabled utf8 support in you bbdd. docs.nextcloud.com/server/18/admin_manual/configuration_database/mysql_4byte_support.html
@skjnldsv why needs info label has been added?
I hope masterleo did not hijack this issue with the emoji issue.
You can ask me any information about unicode NFC / NFD and will do my best to provide you with such information in order to fix this issue.
@skjnldsv why needs info label has been added?
I hope masterleo did not hijack this issue with the emoji issue.
Because I read too fast ;)
What is currently missing here? The issue still states "needs triage", meaning it's not confirmed. Is it an issue with Nextcloud? With the file system?
@skjnldsv why needs info label has been added? I hope masterleo did not hijack this issue with the emoji issue.
Because I read too fast ;)
What is currently missing here? The issue still states "needs triage", meaning it's not confirmed. Is it an issue with Nextcloud? With the file system?
Is an issue with the unicode set Netxcloud supports in method OC_Util::normalizeUnicode() as @OpenCoreCH points out on comment on 22 Nov 2019
Yeah, the issue is that the normalized version of the Unicode that nextcloud expects for a given path does not match the version on the filesystem, so when it attempts to find it, it doesn't exist. In theory it should be looking for close matches and then normalizing them the same way, since you can't guarantee how any given filesystem does normalization (if it even does any in the first place). That's somewhat problematic though because you have to do directory searches instead of direct name lookups.
Solution that worked for me :
open “
public static function normalizeUnicode($value) {
if (Normalizer::isNormalized($value)) {
....
}
BY :
public static function normalizeUnicode($value) {
return mb_convert_encoding($value,"UTF-8");
if (Normalizer::isNormalized($value)) {
....
}
Solution that worked for me :
open “
/lib/private/legacy/OC_Util.php” and change line 1367 : public static function normalizeUnicode($value) { if (Normalizer::isNormalized($value)) { .... }
BY :
public static function normalizeUnicode($value) { return mb_convert_encoding($value,"UTF-8"); if (Normalizer::isNormalized($value)) { .... }
@benjelloun69 Would you mind opening a Pull Request with that solution approach so it can be properly tested and if applicable get merged right away?
Same problem if you have "\n" (new line) in filenames what is accepted by Linux.
Test aaaa .txt
Hi there. I face similar issues on my setup: Ubuntu 18.04 LTS Filesystem: ext4 Nextcloud 19.0.1
The filename in question contains \u000f
unicode character which messes things up. I get ERROR 400: "File name contains at least one invalid character"
.
logs:
{
"reqId": "XXXX",
"level": 4,
"time": "2021-01-26T12:03:51+00:00",
"remoteAddr": "XXXX",
"user": "XXXX",
"app": "webdav",
"method": "PUT",
"url": "/remote.php/dav/files/XXXX/test/%20%0Ftest.txt",
"message": {
"Exception": "OCA\\DAV\\Connector\\Sabre\\Exception\\InvalidPath",
"Message": "File name contains at least one invalid character",
"Code": 0,
"Trace": [
{
"file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Tree.php",
"line": 80,
"function": "getChild",
"class": "OCA\\DAV\\Connector\\Sabre\\Directory",
"type": "->",
"args": [" \u000ftest.txt"]
},
{
"file": "/var/www/html/apps/dav/lib/Connector/Sabre/LockPlugin.php",
"line": 68,
"function": "getNodeForPath",
"class": "Sabre\\DAV\\Tree",
"type": "->",
"args": [
"files/XXXX/test/ \u000ftest.txt"
]
},
{
"file": "/var/www/html/3rdparty/sabre/event/lib/WildcardEmitterTrait.php",
"line": 89,
"function": "getLock",
"class": "OCA\\DAV\\Connector\\Sabre\\LockPlugin",
"type": "->",
"args": [
{ "__class__": "Sabre\\HTTP\\Request" },
{ "__class__": "Sabre\\HTTP\\Response" }
]
},
{
"file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Server.php",
"line": 458,
"function": "emit",
"class": "Sabre\\DAV\\Server",
"type": "->",
"args": [
"beforeMethod:PUT",
[
{ "__class__": "Sabre\\HTTP\\Request" },
{ "__class__": "Sabre\\HTTP\\Response" }
]
]
},
{
"file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Server.php",
"line": 251,
"function": "invokeMethod",
"class": "Sabre\\DAV\\Server",
"type": "->",
"args": [
{ "__class__": "Sabre\\HTTP\\Request" },
{ "__class__": "Sabre\\HTTP\\Response" }
]
},
{
"file": "/var/www/html/3rdparty/sabre/dav/lib/DAV/Server.php",
"line": 319,
"function": "start",
"class": "Sabre\\DAV\\Server",
"type": "->",
"args": []
},
{
"file": "/var/www/html/apps/dav/lib/Server.php",
"line": 320,
"function": "exec",
"class": "Sabre\\DAV\\Server",
"type": "->",
"args": []
},
{
"file": "/var/www/html/apps/dav/appinfo/v2/remote.php",
"line": 35,
"function": "exec",
"class": "OCA\\DAV\\Server",
"type": "->",
"args": []
},
{
"file": "/var/www/html/remote.php",
"line": 167,
"args": ["/var/www/html/apps/dav/appinfo/v2/remote.php"],
"function": "require_once"
}
],
"File": "/var/www/html/apps/dav/lib/Connector/Sabre/Directory.php",
"Line": 225,
"CustomMessage": "--"
},
"userAgent": "Mozilla/5.0 (Macintosh) mirall/3.1.1git (build 4316) (Nextcloud)",
"version": "19.0.1.1"
}
On my Setup it destroys the file-data with libreoffice:
Ubuntu 18,04LTS btrfs PHP Version: 7.4.14 Nextcloud: 20.0.6
One possible cause may be an Apache setup with mod_php and mod_perl simultaneously enabled. This is known to break PHP's setlocale() function – it reports success, but has no effect. Nextcloud actually checks for a working setlocale(), but for some reason the message did not show up for me until I performed an upgrade today.
@hschletz I use NginX with php-fpm and still get this error.
@hschletz I use NginX with php-fpm and still get this error.
That's why I wrote "One possible cause". So far it appears that there are different causes in the server environment (OS, filesystem, locales, webserver) and that Nextcloud's normalization code itself is correct. It's worth listing these possible causes (and their solutions), but not everybody will have the same cause.
Nextcloud has some checks to ensure that those outside factors don't break stuff, but those are not run on every request.
BTW, the workaround in https://github.com/nextcloud/server/issues/7762#issuecomment-732059466 did not work for me.
Here's the code that checks for working setlocale(): https://github.com/nextcloud/server/blob/77083da332b032dc727e1da2f170892932802b64/lib/private/legacy/util.php#L1279-L1285 https://github.com/tchwork/utf8/blob/e1fa4d4a57896d074c9a8d01742b688d5db4e9d5/src/Patchwork/Utf8/Bootup.php#L124-L134
You could put that code in a test script in your nextcloud base directory and invoke it via HTTP to see whether setlocale() is actually working for your setup. If not, running the script via CLI may or may not give a different result, which tells you whether the problem is your webserver setup or something else on the system. If setlocale() works on your webserver, your problem is something else.
Solution that worked for me :
open “
/lib/private/legacy/OC_Util.php” and change line 1367 : public static function normalizeUnicode($value) { if (Normalizer::isNormalized($value)) { .... }
BY :
public static function normalizeUnicode($value) { return mb_convert_encoding($value,"UTF-8"); if (Normalizer::isNormalized($value)) { .... }
It works for me too . Thanks !
if the file is using NFD encoding instead of NFC, you should enable the "NFD compatibility mode" in the mount options: https://docs.nextcloud.com/server/stable/admin_manual/configuration_files/external_storage_configuration_gui.html#mount-options
however, I noticed in the code that specifically for the "Local" external storage this is not working because we check isLocal
when actually we want to check if it's the home storage: https://github.com/nextcloud/server/blob/master/lib/private/legacy/OC_Util.php#L244
The patch from @benjelloun69 still works like a charm with NC 22.x.x and 23.x.x when using external storage. It does not only fix german umlauts but also french accents. "occ files:scan --all" is 100% happy with it but not without it. COOL is also happy and opens every Microsoft/Libreoffice file with an umlaut in its name which failed before.
Why is that patch still not merged? What's wrong with it?
It discards the entirety of the function and replaces it with one specific case.
version 24, bug still exist. problems with cyrillic "Й" not with all files. probably with some old from winXP-7 with some weird encoding.
sudo -u www-data php -f occ files:scan --path "/***
Entry "***.doc" will not be accessible due to incompatible encoding
this solution works for me. any problems to implement? https://help.nextcloud.com/t/invalid-encoding-on-file-names-in-nc19/83835/2
NextCloud 24.0.1.1 External storage (S3 in Yandex.Cloud) admin-configured (not user-configured), "NFD compatibility" checkbox set.
Filename (actually object name in S3) contains cyrillic "й".
Files scan (occ files:scan --path=
) completes fine, no warnings, the problematic file is listed too.
It is possible to "Move or Copy" the file to primary storage but if "Download" is clicked, Nextcloud responds with 503 and in the log file I see:
{"Exception":"Error","Message":"fopen(https://storage.yandexcloud.net/<URL_ENCODED_PATH_TO_OBJECT_HERE>): Failed to open stream: HTTP request failed! HTTP/1.1 404 Not Found\r\n at /var/www/html/lib/private/Files/ObjectStore/S3ObjectTrait.php#79"
Checking the above mentioned url-encoded URL shows that NextCloud attempts to get object with NFC (%D0%B9%
for й
) while listing objects with python + boto3 shows the object has name with NFD (%D0%B8%CC%86
for й
).
So it looks like some encoding compatibility mechanism works when doing copying to primary storage and doesn't work when downloading (or opening the pdf file in browser).
Hi, please update to 24.0.8 or better 25.0.2 and report back if it fixes the issue. Thank you!
I'm currently using 24.0.8.
I can confirm, that files with german umlauts can be scanned with "occ files:scan USER" and opened in Colabora Online Office.
But they cannot be scanned with "occ files:scan USER" or opened with Colabora Online Office when they are password protected.
Adding "return mb_convert_encoding($value,"UTF-8");" into OC-Util.php makes 1. and 2. work flawlessly.
Oh, interesting find! Do you mind creating a PR with your patch here? https://github.com/nextcloud/server/edit/master/lib/private/legacy/OC_Util.php Thanks a lot! :)
@szaimen It was not me finding the solution... it was from another guy @benjelloun69 . Look here...
https://help.nextcloud.com/t/invalid-encoding-on-file-names-in-nc19/83835
He posted a solution on 1. Nov. 2020... but somehow it was not accepted... read my comment from 26. Apr. 2022 until here. I would be very happy if his patch would be accepted or a different one which satisfy Nextcloud devs.
Adding this small line on every Nextcloud release since more than 2 years is really annoying:(
I guess we have not seen this form post. Can you try to create the PR? I'll then help you moving this forward :)
see also troubleshooting NFD encoding issues with external storage: https://docs.nextcloud.com/server/latest/admin_manual/issues/general_troubleshooting.html#troubleshooting-file-encoding-on-external-storages
I'm not sure if the proposed patch will make everything work correctly. Maybe the scanner will find the file but when you'll try to overwrite it through the web UI or Webdav, it will create another instance of the file with the NFC normalized name. So you'll see two files on disk with seemingly the same name, but one is with NFC normalized and one with NFD (the original one).
For external storages, a special compatibility mode has been developed (see link above) which will always try both encodings to avoid such issues. However this approach makes everything slower as more FS accesses are required.
for those already using compatibility mode and can confirm that they have NFD encoded file names and it still doesn't work, then it can be handled as bug. Back then this mode was mostly tested with SMB storages and maybe some other storages like S3 need further workarounds to work correctly.
@PVince81 Now this gets interesting.... I work for an employer where we use Windows, Linux and MacOSX.
I have an account on our Nextcloud 24.0.8 where I use a Samba4 (2:4.9.5+dfsg-5+deb10u3, Debian Buster) DFS enabled share. There I have an excel sheet with german umlauts and Space in its name and it's password protected. When I don't use the patch from @benjelloun69 I cannot successfully scan this file with occ or open it with Colabora Online Office (COOL). When I try to open it with COOL I see only a spinning wheel from Nextcloud. When I enable NFD for this share COOL opens but fails to give me the dialog to enter the password.
Next, I created a new excel sheet with MS Office 2013 on my Windows System with a german umlaut and a space in its name and protected it with a password. Then logged into Nextcloud and this file can be scanned and opened with COOL without problems... the password dialog appears.
So my guess, that it is a problem with password protected files which have an umlaut in its name IS WRONG, sorry for inconvenience.
Summary... fact is...
And before you ask, this excel sheet has "very" sensitive data in it, so I cannot share.
When I have more time... I will try to remove the password from that excel sheet and test again. If not possible maybe changing the password from my Windows or Linux system helps. This is of course not a solution to the problem, but may give more insight.
in case it's useful, you can copy-paste a file name and pass it to this script and it will tell you what normalization it has and also show you both conversions:
<?php
$s = $argv[1];
if (\Normalizer::isNormalized($s, \Normalizer::FORM_D)) {
print("Original string is using NFD normalization\n");
$nfc = \Normalizer::normalize($s, \Normalizer::FORM_C);
print("NFC: $nfc\n");
print("NFD: $s\n");
} elseif (\Normalizer::isNormalized($s, \Normalizer::FORM_C)) {
print("Original string is using NFC normalization\n");
$nfd = \Normalizer::normalize($s, \Normalizer::FORM_D);
print("NFC: $s\n");
print("NFD: $nfd\n");
} else {
print("Unknown normalization\n");
}
Steps to reproduce
Expected behaviour
Every file in this folder shoud be scanned and shown in the files-app.
Actual behaviour
These files came through download on the harddisk of my homeserver. The folder containing the downloaded files are configured as “local” external storage in my nextcloud. Files and folders with german “umlaute” created by nextcloud in the files-app appear in the file listings. Other files and folders (from download) are ignored by the occ-file-scan.
While file-scan in debug mode the following messages appear in nextcloud.log. There have to be Lügen instead of L\u00fcgen and Hölle instead of H\u00f6lle for example.
Server configuration
Operating system: Ubuntu Server 17.10
Web server: Apache 2.4.27
Database: MySQL
PHP version: PHP 7.1.11-0ubuntu0.17.10.1
Nextcloud version: 12.0.4
Updated from an older Nextcloud/ownCloud or fresh install: fresh install
Where did you install Nextcloud from: nextcloud.com
List of activated apps:
App list
``` Enabled: - dav: 1.3.0 - federatedfilesharing: 1.2.0 - files: 1.7.2 - files_external: 1.3.0 - files_sharing: 1.4.0 - files_videoplayer: 1.1.0 - lookup_server_connector: 1.0.0 - notifications: 2.0.0 - oauth2: 1.0.5 - provisioning_api: 1.2.0 - theming: 1.3.0 - twofactor_backupcodes: 1.1.1 - updatenotification: 1.2.0 - workflowengine: 1.2.0 Disabled: - activity - admin_audit - comments - encryption - federation - files_pdfviewer - files_texteditor - files_trashbin - files_versions - firstrunwizard - gallery - logreader - nextcloud_announcements - password_policy - serverinfo - sharebymail - survey_client - systemtags - user_external - user_ldap ```Nextcloud configuration:
Config report
``` { "system": { "instanceid": "oc65jgv8zf6o", "passwordsalt": "***REMOVED SENSITIVE VALUE***", "secret": "***REMOVED SENSITIVE VALUE***", "trusted_domains": [ "toothless.goip.de", "toothless.fritz.box" ], "datadirectory": "\/var\/www\/nextcloud\/data", "overwrite.cli.url": "https:\/\/toothless.goip.de", "dbtype": "mysql", "version": "12.0.4.3", "dbname": "nextcloud", "dbhost": "localhost", "dbport": "", "dbtableprefix": "oc_", "mysql.utf8mb4": true, "dbuser": "***REMOVED SENSITIVE VALUE***", "dbpassword": "***REMOVED SENSITIVE VALUE***", "installed": true, "skeletondirectory": "", "logtimezone": "Europe\/Berlin", "memcache.local": "\\OC\\Memcache\\APCu", "memcache.locking": "\\OC\\Memcache\\Redis", "redis": { "host": "localhost", "port": "6379" }, "htaccess.RewriteBase": "\/", "mail_smtpmode": "smtp", "mail_smtpauthtype": "LOGIN", "mail_smtpauth": 1, "mail_from_address": "jan.noormann", "mail_domain": "gmail.com", "mail_smtphost": "smtp.gmail.com", "mail_smtpport": "587", "mail_smtpname": "***REMOVED SENSITIVE VALUE***", "mail_smtppassword": "***REMOVED SENSITIVE VALUE***", "mail_smtpsecure": "tls" } } ```Are you using external storage, if yes which one: local
Are you using encryption: no
Are you using an external user-backend, if yes which one: no
Client configuration
Browser: Opera, Chrome, Firefox
Operating system: Windows 10