Closed cwendling closed 1 year ago
Oh I get it now, thanks for the clarification, appreciate the thorough explanation @cwendling, you've been a great help.
Turns out we have a problem with tar.xz files at this point, the directory within is not being recognized starting with the PR this one was supposed to fix (two different test files but both known good tar.xz files made by engrampa: Before 8e33e604d87e90bda3cc748bc1627c164d72a16c
After 8e33e604d87e90bda3cc748bc1627c164d72a16c:
Confirm that the problem exists
MAGIC
returned incorrect mime type while processing tar
compression type
should be mime_type = application/x-compressed-tar
actually mime_type = application/x-gzip
Actually, this is also the correct handling method, but it only requires two processing steps
[test@localhost test]$ file test.tar.xz
test.tar.xz: XZ compressed data
[test@localhost test]$ xz -d test.tar.xz
[test@localhost test]$ ls
test.tar
[test@localhost test]$ file test.tar
test.tar: POSIX tar archive (GNU)
[test@localhost test]$ tar -xvf test.tar
test/
test/main.c
test/b.c
test/main
test/libb.so
test/a.c
test/liba.so
[test@localhost test]$
@cwendling How do you view this issue? Do we revert back to the past?
Hum, I don't know… I assumed the intent of #490 was good, but maybe it isn't if libmagic doesn't tell us about those subtypes. Does the g_content_type
thing work better? If so, that could be the first (or only?) magic-aware handler (is it not just as capable as libmagic?).
@sinha-toyeesh what was the actual intent behind #490? Did you have something you needed fixing, or was this for the same of making it seemingly more consistent?
Anyhow, this would require more testing but I'd personally try g_content_type
first. AFAICT it does both filename and magic matching, which is kind of the best of both world. And it does the right thing for e.g. .tar.xz
:
$ gio info -a standard::content-type,standard::fast-content-type foobar.tar.xz
[…]
attributes :
standard::content-type: application/x-xz-compressed-tar
standard::fast-content-type: application/x-xz-compressed-tar
This said, I really don't know much about Engrampa's internals, nor history behind the filename/gcontenttype/libmagic trio, so I'm not necessarily the best one to give an opinion.
EDIT:
FWIW, here's the same file under another name:
$ file -i a.tar.xz
a.tar.xz: application/x-xz; charset=binary
$ gio info -a standard::content-type,standard::fast-content-type a.tar.xz
[…]
attributes:
standard::content-type: application/x-xz-compressed-tar
standard::fast-content-type: application/x-xz-compressed-tar
$ mv a.tar.xz a.b
$ file -i a.b
a.b: application/x-xz; charset=binary
$ gio info -a standard::content-type,standard::fast-content-type a.b
[…]
attributes:
standard::content-type: application/x-xz
standard::fast-content-type: application/octet-stream
$ mv a.b a.txz
$ file -i a.txz
a.txz: application/x-xz; charset=binary
$ gio info -a standard::content-type,standard::fast-content-type a.txz
[…]
attributes:
standard::content-type: application/x-xz-compressed-tar
standard::fast-content-type: application/x-xz-compressed-tar
$ mv a.txz a.c
$ file -i a.c
a.c: application/x-xz; charset=binary
$ gio info -a standard::content-type,standard::fast-content-type a.c
[…]
attributes:
standard::content-type: text/x-csrc
standard::fast-content-type: text/x-csrc
$ mv a.c a.pdf
$ file -i a.pdf
a.pdf: application/x-xz; charset=binary
$ gio info -a standard::content-type,standard::fast-content-type a.pdf
[…]
attributes:
standard::content-type: application/pdf
standard::fast-content-type: application/pdf
$ mv a.pdf a
$ file -i a
a: application/x-xz; charset=binary
$ gio info -a standard::content-type,standard::fast-content-type a
[…]
attributes:
standard::content-type: application/x-xz
standard::fast-content-type: application/octet-stream
So what we can see in this example is that
I would say it say it was a bit of both as mentioned in the original quoted #206, the solution to which, I assumed was an issue of inconsistent use of libmagic, however that was an incomplete picture, as you have elaborated in your example.
A blooper has been made there:
This probably doesn't change much in the wild as magic is gonna work most of the time, but it's especially problematic that the non-libmagic case doesn't have the filename test.
Anyway, fix this so the code is consistent, and we retain the behavior for the non-libmagic case, and have the new expected one for the libmagic case.
CC @sinha-toyeesh