rotdrop / nextcloud-app-files-archive

Archive inspection and extraction as Nextcloud app.
Other
9 stars 4 forks source link

Installation instructions for more supported file formats #42

Closed tuxArg closed 1 month ago

tuxArg commented 2 months ago

I'm trying to understand how to add more supported file formats. Of all the extensions I tried, only zip was able to mount.

The documentation refers to: but as far I can see UnifiedArchive is shipped with the app.

So, any hint on how to add support for other file formats like tgz, tar.bz2 and txz?

Thanks

rotdrop commented 1 month ago

I think that UnifiedArchive supports further archive formats out of the box, however, certain formats require PHP extensions or external helper applications:

https://github.com/wapmorgan/UnifiedArchive/blob/master/docs/Drivers.md

The only special thing which files_archive does is to define a ranking for .zip

https://github.com/rotdrop/nextcloud-app-files-archive/blob/6f34d1968d7fef5861f04de4a8ea636d00bca06a/php-toolkit/Backend/ArchiveBackend.php#L43-L50

Best Claus

tuxArg commented 1 month ago

Hi, thanks for the info.

Maybe I should know it. I know how to install a php extension and then it's seen by php-fpm or use composer in my own php application but how do I make nextcloud use a composer installed library like SevenZip or TarByPear?

rotdrop commented 1 month ago

Hi, thanks for the info.

Maybe I should know it. I know how to install a php extension and then it's seen by php-fpm or use composer in my own php application but how do I make nextcloud use a composer installed library like SevenZip or TarByPear?

Please note the PHP-packages are already installed if available via Composer, just companion programs and PHP-extension are not installed as this cannot easily be done during the installation process:

https://github.com/rotdrop/nextcloud-app-files-archive/blob/6f34d1968d7fef5861f04de4a8ea636d00bca06a/composer.json.in#L13-L19

tuxArg commented 1 month ago

Now I'm lost. If unified-archive and archive_tar are already required by this app, then why I am unable to mount a tar.bz2 file then? php-bz2 is already installed. What could be missing?

Edit: I have p7zip on my system too.

rotdrop commented 1 month ago

Now I'm lost. If unified-archive and archive_tar are already required by this app, then why I am unable to mount a tar.bz2 file then? php-bz2 is already installed. What could be missing?

Good question. Personally, I am able to deal with .tar.xz archives. What about the "mount archive" file action entry: does it show up? And in the details view, is there an entry for the archive app?

The file action entries and there side-part tab are controlled by the mime-types which are supported by UnifiedArchive. To this end files_archive queries UA about its supported mime-types and then tweaks the Nextcloud list of mime-type to make sure that all (supported) archive files get their respective correct mime-type.

BTW: which Nextcloud version are you using?

tuxArg commented 1 month ago

It's nextcloud 28. I've just tried: .zip and .7z work, they have the archive icon. some archives with filenames that have symbols couldn't be mounted

.tar.xz doesn't work (error 500). It has the mount archive option and archive detail view (all compression info fields say unknown though) but its icon is a gear.

rotdrop commented 1 month ago

It's nextcloud 28. I've just tried: .zip and .7z work, they have the archive icon. some archives with filenames that have symbols couldn't be mounted

Could you supply one of the non-working archive files, or generate one? Of course, please do not publish private data ;)

.tar.xz doesn't work (error 500). It has the mount archive option and archive detail view (all compression info fields say unknown though) but its icon is a gear.

500 = internal server error Is there something about this in the Nextcloud or server log-files?

The thing with the icon is correct: strictly speaking, .tar.xz is not an archive file format, but a compressed archive file. Still the app should be able to extract it, though handling these kind of archive files is not very efficient: you have to decompress the entire file, there is no means to skip directly to individual archive members.

tuxArg commented 1 month ago

Could you supply one of the non-working archive files, or generate one? Of course, please do not publish private data ;)

Sure, see attached. test.zip

500 = internal server error Is there something about this in the Nextcloud or server log-files?

{"reqId":"jpVDbKu6RQo4FHpVDbjxHk","level":3,"time":"2024-05-23T11:04:34+00:00","remoteAddr":"10.8.0.103","user":"testuser","app":"index","method":"POST","url":"/index.phps_archive/archive/mount/%252FDocuments%252FtestDir%252Fprueba%2520tar%2520xz.tar.xz","message":"No se puede abrir el archivo de archivo /testuser/files/Documents/testDir/prueba tar xz.tar.xz (/var/www/nextcloud/data/testuser/files/Documents/testDir/prueba tar xz.tar.xz)","userAgent":"Mozilla/5.0 (Android 14; Mobile; rv:125.0) Gecko/125.0 Firefox/125.0","version":"28.0.5.1","exception":{"Exception":"OCA\FilesArchive\Toolkit\Exceptions\ArchiveCannotOpenException","Message":"No se puede abrir el archivo de archivo /testuser/files/Documents/testDir/prueba tar xz.tar.xz (/var/www/nextcloud/data/testuser/files/Documents/testDir/prueba tar xz.tar.xz)","Code":0,"Trace":[{"file":"/mnt/nc_data/apps/files_archive/lib/Controller/MountController.php","line":176,"function":"open","class":"OCA\FilesArchive\Toolkit\Service\ArchiveService","type":"->"},{"file":"/var/www/nextcloud/lib/private/AppFramework/Http/Dispatcher.php","line":230,"function":"mount","class":"OCA\FilesArchive\Controller\MountController","type":"->"},{"file":"/var/www/nextcloud/lib/private/AppFramework/Http/Dispatcher.php","line":137,"function":"executeController","class":"OC\AppFramework\Http\Dispatcher","type":"->"},{"file":"/var/www/nextcloud/lib/private/AppFramework/App.php","line":184,"function":"dispatch","class":"OC\AppFramework\Http\Dispatcher","type":"->"},{"file":"/var/www/nextcloud/lib/private/Route/Router.php","line":315,"function":"main","class":"OC\AppFramework\App","type":"::"},{"file":"/var/www/nextcloud/lib/base.php","line":1069,"function":"match","class":"OC\Route\Router","type":"->"},{"file":"/var/www/nextcloud/index.php","line":39,"function":"handleRequest","class":"OC","type":"::"}],"File":"/mnt/nc_data/apps/files_archive/lib/Toolkit/Service/ArchiveService.php","Line":272,"message":"No se puede abrir el archivo de archivo /testuser/files/Documents/testDir/prueba tar xz.tar.xz (/var/www/nextcloud/data/testuser/files/Documents/testDir/prueba tar xz.tar.xz)","exception":{},"CustomMessage":"No se puede abrir el archivo de archivo /testuser/files/Documents/testDir/prueba tar xz.tar.xz (/var/www/nextcloud/data/testuser/files/Documents/testDir/prueba tar xz.tar.xz)"}}

(error messages are in spanish)

rotdrop commented 1 month ago

On my side I do not have problems with the tar-archive that you find here:

https://cloud.claus-justus-heine.de/s/Jm9Z48H64i8dRpH

in the sub-folder 'tar-tests/'. Maybe you could try that one before we try to dig into the not-working tar-archive of yours. The error message above simply means that the file could not be handled by UnifiedArchive.

tuxArg commented 1 month ago

Hi, both files here https://cloud.claus-justus-heine.de/s/Jm9Z48H64i8dRpH?path=%2Ftar-tests don't work for me. Same error 500.

btw, have you tried my sample zip file? What I forgot to tell you is that it let me mount it but it's content isn't shown.

tuxArg commented 1 month ago

I'm looking at my phpinfo() seen by apache. I think I have all the relevant required libraries. I copy some of the lines:

Additional .ini files parsed | /etc/php/8.1/apache2/conf.d/10-mysqlnd.ini, /etc/php/8.1/apache2/conf.d/10-opcache.ini, /etc/php/8.1/apache2/conf.d/10-pdo.ini, /etc/php/8.1/apache2/conf.d/15-xml.ini, /etc/php/8.1/apache2/conf.d/20-apcu.ini, /etc/php/8.1/apache2/conf.d/20-bcmath.ini, /etc/php/8.1/apache2/conf.d/20-bz2.ini, /etc/php/8.1/apache2/conf.d/20-calendar.ini, /etc/php/8.1/apache2/conf.d/20-ctype.ini, /etc/php/8.1/apache2/conf.d/20-curl.ini, /etc/php/8.1/apache2/conf.d/20-dom.ini, /etc/php/8.1/apache2/conf.d/20-exif.ini, /etc/php/8.1/apache2/conf.d/20-ffi.ini, /etc/php/8.1/apache2/conf.d/20-fileinfo.ini, /etc/php/8.1/apache2/conf.d/20-ftp.ini, /etc/php/8.1/apache2/conf.d/20-gd.ini, /etc/php/8.1/apache2/conf.d/20-gettext.ini, /etc/php/8.1/apache2/conf.d/20-gmp.ini, /etc/php/8.1/apache2/conf.d/20-iconv.ini, /etc/php/8.1/apache2/conf.d/20-igbinary.ini, /etc/php/8.1/apache2/conf.d/20-imagick.ini, /etc/php/8.1/apache2/conf.d/20-imap.ini, /etc/php/8.1/apache2/conf.d/20-intl.ini, /etc/php/8.1/apache2/conf.d/20-ldap.ini, /etc/php/8.1/apache2/conf.d/20-mbstring.ini, /etc/php/8.1/apache2/conf.d/20-msgpack.ini, /etc/php/8.1/apache2/conf.d/20-mysqli.ini, /etc/php/8.1/apache2/conf.d/20-pdo_mysql.ini, /etc/php/8.1/apache2/conf.d/20-pdo_sqlite.ini, /etc/php/8.1/apache2/conf.d/20-phar.ini, /etc/php/8.1/apache2/conf.d/20-posix.ini, /etc/php/8.1/apache2/conf.d/20-readline.ini, /etc/php/8.1/apache2/conf.d/20-redis.ini, /etc/php/8.1/apache2/conf.d/20-shmop.ini, /etc/php/8.1/apache2/conf.d/20-simplexml.ini, /etc/php/8.1/apache2/conf.d/20-smbclient.ini, /etc/php/8.1/apache2/conf.d/20-sockets.ini, /etc/php/8.1/apache2/conf.d/20-sqlite3.ini, /etc/php/8.1/apache2/conf.d/20-sysvmsg.ini, /etc/php/8.1/apache2/conf.d/20-sysvsem.ini, /etc/php/8.1/apache2/conf.d/20-sysvshm.ini, /etc/php/8.1/apache2/conf.d/20-tokenizer.ini, /etc/php/8.1/apache2/conf.d/20-xmlreader.ini, /etc/php/8.1/apache2/conf.d/20-xmlwriter.ini, /etc/php/8.1/apache2/conf.d/20-xsl.ini, /etc/php/8.1/apache2/conf.d/20-zip.ini, /etc/php/8.1/apache2/conf.d/25-memcached.ini, /etc/php/8.1/apache2/conf.d/99-nexcloud.ini

Registered PHP Streams | https, ftps, compress.zlib, php, file, glob, data, http, ftp, compress.bzip2, phar, smb, zip
Registered Stream Filters | zlib.*, string.rot13, string.toupper, string.tolower, convert.*, consumed, dechunk, bzip2.*, convert.iconv.*

bz2
BZip2 Support | Enabled
Stream Wrapper support | compress.bzip2://
Stream Filter support | bzip2.decompress, bzip2.compress

phar
Phar API version | 1.1.1
Phar-based phar archives | enabled
Tar-based phar archives | enabled
ZIP-based phar archives | enabled
gzip compression | enabled
bzip2 compression | enabled
Native OpenSSL support | enabled

zip
Zip enabled
Zip version 1.19.5
Libzip version  1.7.3
BZIP2 compression   Yes
XZ compression  No
ZSTD compression    No
AES-128 encryption  Yes
AES-192 encryption  Yes
AES-256 encryption  Yes

I've also tried .tar (uncompressed) and .tar.bz2 don't work either. So maybe it's a tar thing that is not working?

rotdrop commented 1 month ago

Hi, both files here https://cloud.claus-justus-heine.de/s/Jm9Z48H64i8dRpH?path=%2Ftar-tests don't work for me. Same error 500.

btw, have you tried my sample zip file? What I forgot to tell you is that it let me mount it but it's content isn't shown.

I have just tried your test.zip and am able to mount and extract it, but the result is an empty directory. Was this also what you experienced? Listing test.zip with 7z however, show one file t.webp and on the command line I am also able to extract the archive. Ah. No: the name of the file is t.webp (WEBP Image, 960 × 1248 pixels) — Scaled (46%).txt: ASCII text, with no line terminators

tuxArg commented 1 month ago

I have just tried your test.zip and am able to mount and extract it, but the result is an empty directory. Was this also what you experienced?

Yes. There's a char that doesn't work. I haven't tested all combinations of: ()×,—%

rotdrop commented 1 month ago

I have just tried your test.zip and am able to mount and extract it, but the result is an empty directory. Was this also what you experienced?

Yes. There's a char that doesn't work. I haven't tested all combinations of: ()×,—%

It seems there is another char -- I manipulated the following line in UnifiedArchive:

https://github.com/wapmorgan/UnifiedArchive/blob/5f02ad060223fd714aaf7f64a18d8e819ac0ab93/src/UnifiedArchive.php#L527

to include a print_r($this->files, true) in the exception. I get this:

File t.webp (WEBP Image, 960 × 1248 pixels) — Scaled (46%).txt does not exist in archive Array\n(\n    [0] => t.webp (WEBP Image, 960?נ1248 pixels) ? Scaled (46%).txt\n)\n

Note that fancy character נ (scroll horizontally to see the entire code block!) which according to Wikipedia https://en.wikipedia.org/wiki/Nun_(letter)#Hebrew_nun is some Hebrew letter.

Howerver, that fancy unicode "times" (i.e. your ×) has been replaced by a question mark. So the problem seems to be related to multi-byte characters. It is strange, however, that × (U+00D7) is handled correctly in other places ... the question is then: at which point is it garbled by which operation?

rotdrop commented 1 month ago

$this->files is set in only one place ... in my case it always gets the files right when generating the mount point. That is: I can mount the archive and see the correct file name when I open the mount-directory in the web interface. However, reliably when I try to access the file internally the file name is garbled and hence I cannot access the file ... strange. Bug in the underlying backend, perhaps? Buffer overrun?

tuxArg commented 1 month ago

I think I missed that char. And I don't really know how to debug that. I'm glad to report it.

Anyway, I'm still having trouble mounting tar (.tar, .tar.xz, .tar.bz2) files. If you have any ideas..

rotdrop commented 1 month ago

I think I missed that char. And I don't really know how to debug that. I'm glad to report it.

Anyway, I'm still having trouble mounting tar (.tar, .tar.xz, .tar.bz2) files. If you have any ideas..

No, it is not the Hewbrew char, it is the unicode sequence of unicode unbreakable spaces and the unicode multiplication sign ...

No idea about the tar files, on my side everything just works fine. Maybe we could trigger @dimangelid from #38 to report whether or not he is able to mount tar archives (just because I know that he has the thing running, basically).

rotdrop commented 1 month ago

I think I missed that char. And I don't really know how to debug that. I'm glad to report it. [....] No, it is not the Hewbrew char, it is the unicode sequence of unicode unbreakable spaces and the unicode multiplication sign ...

It is then rather an issue with the archive7z backend than with files_archive than with unified-archive ...

rotdrop commented 1 month ago

... but now I wonder whether there is some PHP ini setting in effect when accessing the filess ... extracting them on the command line just works, but accessing any archive member with unicode chars in the file name does not work ... I have also generated a more minimalistic example with short filenames consisting of unicode characters

tux-arg-test-recreated.zip

rotdrop commented 1 month ago

Tracked down the thing until the invocation of 7z in vendor/gemorroj/archive7z. I can reproduce the bug when I manually set the environment variable to a non-UTF-8 aware setting like

LANG=C 7z l -y -p -slt ~/Downloads/tux-arg-test/tux-arg-test-recreated.zip

However, with UTF-8 things work flawlessly:

LANG=C.UTF-8  7z l -y -p -slt ~/Downloads/tux-arg-test/tux-arg-test-recreated.zip

(I do not post the output here, you may want to try it by yourself).

So then the question remains: who the heck fiddles with the environment, and differently when mounting the archive and when accessing the archive contents. What a sh*t ;)

rotdrop commented 1 month ago

Good thing is: close to a solution!

rotdrop commented 1 month ago

Unfortunately not that simple: the $_ENV variable is empty -- however, if I manually set it to $_ENV['LANG'] = 'C.UTF_8' then accessing the archive members work. However, as listing the archive members during the initial mount does work, this cannot be the real source of the problem. I wonder whether perhaps the 7z program itself is flaky w.r.t. to UTF-8.

rotdrop commented 1 month ago

Oh oh oh. I think I now do understand:

When looking at the source code of https://github.com/Gemorroj/Archive7z there are some settings which are only in effect on windoze platforms with the note that they do not help. This is correct. I suppose the reason is that the default locale used for command output is just ASCII for glibc.

So how to proceed:

rotdrop commented 1 month ago

BTW, there was a similar issue with my app https://github.com/rotdrop/nextcloud-app-pdf-downloader

https://github.com/rotdrop/nextcloud-app-pdf-downloader/blob/0433e257cc74aadb0dfbee02d41c1b45e6af02b8/lib/Backend/PdfTk.php#L40-L46

Maybe some things just have to happen again ...

rotdrop commented 1 month ago

I believe that 2f0cc75 closes the locale problem.

rotdrop commented 1 month ago

For the tar problem: there is a command line utility

vendor/bin/cam

So if you have shell access (e.g. via ssh) to your nextcloud installation you could try to use test whether cam is able to give information and extract the archives. The new pre-release also contains the Symfony console package so there chances that the cam just works. Please see https://github.com/wapmorgan/UnifiedArchive/blob/master/README.md for more information.

In principle I would like to close this topic, as part of the problems have been resolved and the original issue title was a non-issue. Could you please open a new issue for your tar-archive problems? Kind thanks!