rotdrop / nextcloud-app-files-archive

Archive inspection and extraction as Nextcloud app.
Other
9 stars 4 forks source link

File not found if path contains umlauts #45

Closed hampoelz closed 4 months ago

hampoelz commented 4 months ago

After mounting a .zip file, I can open the documents and images contained in it as long as the file name or one of its parent folders does not contain an umlaut. In case of umlauts, the file does not open and the following error is displayed: NonExistentArchiveFileException File "Path or File with umlaut.pdf" does not exist in archive.

rotdrop commented 4 months ago

I think this is a duplicate of (parts of) #42 and fixed with commit 2f0cc75. Can you please check with the latest pre-release or clone one of the release branches stable28 / stable29?

hampoelz commented 4 months ago

Oh, I must have overlooked that. Great that the problem should already be fixed :)

But unfortunately, the installation of the rc4 pre-release leads to the following error:

Updating <files_archive> ...
An unhandled exception has been thrown:
TypeError: OCA\FilesArchive\Migration\RegisterMimeTypes::logError(): Argument #2 ($context) must be of type array, string given, called in /var/www/html/custom_apps/files_archive/lib/Migration/RegisterMimeTypes.php on line 78 and defined in /var/www/html/custom_apps/files_archive/lib/Toolkit/Traits/LoggerTrait.php:164
Stack trace:
#0 /var/www/html/custom_apps/files_archive/lib/Migration/RegisterMimeTypes.php(78): OCA\FilesArchive\Migration\RegisterMimeTypes->logError('Unable to updat...', '/var/www/html/c...')
#1 /var/www/html/lib/private/Repair.php(124): OCA\FilesArchive\Migration\RegisterMimeTypes->run(Object(OC\Repair))
#2 /var/www/html/lib/private/legacy/OC_App.php(838): OC\Repair->run()
#3 /var/www/html/lib/private/legacy/OC_App.php(779): OC_App::executeRepairSteps('files_archive', Array)
#4 /var/www/html/lib/private/Updater.php(351): OC_App::updateApp('files_archive')
#5 /var/www/html/lib/private/Updater.php(262): OC\Updater->doAppUpgrade()
#6 /var/www/html/lib/private/Updater.php(129): OC\Updater->doUpgrade('29.0.0.19', '29.0.0.19')
#7 /var/www/html/core/Command/Upgrade.php(216): OC\Updater->upgrade()
#8 /var/www/html/custom_apps/files_archive/vendor/symfony/console/Command/Command.php(298): OC\Core\Command\Upgrade->execute(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#9 /var/www/html/custom_apps/files_archive/vendor/symfony/console/Application.php(1040): Symfony\Component\Console\Command\Command->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#10 /var/www/html/custom_apps/files_archive/vendor/symfony/console/Application.php(301): Symfony\Component\Console\Application->doRunCommand(Object(OC\Core\Command\Upgrade), Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#11 /var/www/html/custom_apps/files_archive/vendor/symfony/console/Application.php(171): Symfony\Component\Console\Application->doRun(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#12 /var/www/html/lib/private/Console/Application.php(213): Symfony\Component\Console\Application->run(Object(Symfony\Component\Console\Input\ArgvInput), Object(Symfony\Component\Console\Output\ConsoleOutput))
#13 /var/www/html/console.php(113): OC\Console\Application->run()
#14 /var/www/html/occ(11): require_once('/var/www/html/c...')
#15 {main}
rotdrop commented 4 months ago

Ok, so there is a bug in the error message stating that your config-directory or at least the file config/mimetype{mapping,aliases}.json is not writable. You could fix that by yourself if you want if you replace

https://github.com/rotdrop/nextcloud-app-files-archive/blob/7cdb772dfb4f10eda67097b019ebd51caf4b440e/lib/Migration/RegisterMimeTypes.php#L78

by

$this->logError('Unable to update "' . $coreFile . '", file is not writable.');

The rc5 pre-release does not yet fix it.

hampoelz commented 4 months ago

Unfortunately I still have the issue with the umlauts in rc6.

rotdrop commented 4 months ago
hampoelz commented 4 months ago

Test Archive.zip

rotdrop commented 4 months ago

You are right, this archive does not work. It lists correctly, but the files cannot be opened.

Can we please first verify whether you really use a backend named "Zip" or rather the "SevenZip" Backend? I think so ...

If I try to use 7z with your archive on the command line, then setting the locale to be UTF-8 aware actually leads to strange handling of the contained multibyte characters in the console, but this might be a console bug. When I paste the output here or into any decent editor, then the 'ö' (in particular) is correctly rendered.

I have to investigate further where things break.

hampoelz commented 4 months ago

The Nextcloud detail view explicitly displays "Zip" as backend driver. Also, as far as I know, 7zip should not be installed in my Nextcloud docker container. However, I used 7zip to create the archive on my local system: 7zz a -mx9 -r "Test Archive.zip" "Test Archive"

rotdrop commented 4 months ago

The Nextcloud detail view explicitly displays "Zip" as backend driver. Also, as far as I know, 7zip should not be installed in my Nextcloud docker container. However, I used 7zip to create the archive on my local system: 7zz a -mx9 -r "Test Archive.zip" "Test Archive"

Thanks, then the bad news is: despite the latest UTF-8 locale wrapper it doesn't work with neither the Zip or the SevenZip backend.

rotdrop commented 4 months ago

This .DS_Store file in the archive implies that the archive was created on an apple computer? Just asking because apple has some conventions for some UTF-8 multibyte characters than the rest of the world. Nevertheless it should just work, of course.

rotdrop commented 4 months ago

So yes, one of those things which makes apple special. See for example here for an explanation:

https://stackoverflow.com/questions/5581857/git-and-the-umlaut-problem-on-mac-os-x

So what happens: apple has this other convention of storing "special" characters. I suppose that at some point Nextcloud does some normalization of the filename -- or maybe even PHP somehow changes things. Now Nextcloud tries to open a file and requests to fetch an archive member with the NC convention of storing "Umlaute" (and other "complicated" UTF-8 characters). Your on-the-apple created archive however uses the other convention. Hence the two names do not really match.

I am actually quite fed up with apple always doing things in a way that hinders using apple things together with the rest of the world. Anyhow, one solution would be to try to detect which convention of representing file names is used inside the archive file and if necessary translate the Nextcloud filename first to the apple convention. This would solve the problem.

rotdrop commented 4 months ago

Here ist the piece of code in Nextcloud which does the path normalization

https://github.com/nextcloud/server/blob/ddb840c36babd02abedce7da41b0c04849009edb/lib/private/Files/Filesystem.php#L619

which finally uses the PHP class Normalizer which can translate between NFC (PHP and probably most non-apple stuff) and NFD (which is what apple uses).

So it remains to determine which convention is used inside an archive. This could be done during the initial scanning of the archive by using the Normalizer::isNormalized() method.

Although in principle an archive could contain filenames with both convention even inside the same filename this is unlikely. So the NFD workaround would just check if any filename in the archive does not use the NFC convention and then convert to NFD before passing names back to the archive backend for extraction.

rotdrop commented 4 months ago

Ok, after deinstalling 7z it seems that the code in #46 also works with the ZIP driver (look at the preview icons on the screenshots)

Screenshot_20240529_103420 Screenshot_20240529_103344

rotdrop commented 4 months ago

So my impression is that this is now fixed by 9699b44, the commit is included in the latest pre-release, can you please check whether or not this solves your problem, which IMHO was caused by different conventions about unicode normalization on different systems?

hampoelz commented 4 months ago

Wow, that was quick! Yeah, exactly, I created the archive under macOS. If I had known that beforehand, I would have just created the archive on a different device :/. To be honest, I also hate that Apple has to do everything differently. Unfortunately, there's no way around it as soon as you want to do something cross-platform. Definitely another point in my macbook hate-love relationship.

Anyway, I tested the rc7 pre-release with my archives and it all works fine now. But, I haven't tested any archives other than .zip or archives with umlauts that were not created on macOS.

Thank you for the quick fix and your detailed explanations, I really appreciate that.