mate-desktop / engrampa

A file archiver for MATE
http://www.mate-desktop.org
GNU General Public License v2.0
113 stars 52 forks source link

Fix MIME detection logic from #490 #491

Closed cwendling closed 1 year ago

cwendling commented 1 year ago

A blooper has been made there:

This probably doesn't change much in the wild as magic is gonna work most of the time, but it's especially problematic that the non-libmagic case doesn't have the filename test.

Anyway, fix this so the code is consistent, and we retain the behavior for the non-libmagic case, and have the new expected one for the libmagic case.

CC @sinha-toyeesh

sinha-toyeesh commented 1 year ago

Oh I get it now, thanks for the clarification, appreciate the thorough explanation @cwendling, you've been a great help.

lukefromdc commented 1 year ago

Turns out we have a problem with tar.xz files at this point, the directory within is not being recognized starting with the PR this one was supposed to fix (two different test files but both known good tar.xz files made by engrampa: Before 8e33e604d87e90bda3cc748bc1627c164d72a16c Engrampa_before_PRs

After 8e33e604d87e90bda3cc748bc1627c164d72a16c:

Engrampa_after_PRs

zhuyaliang commented 1 year ago

Confirm that the problem exists

zhuyaliang commented 1 year ago

MAGIC returned incorrect mime type while processing tar compression type should be mime_type = application/x-compressed-tar actually mime_type = application/x-gzip

zhuyaliang commented 1 year ago

Actually, this is also the correct handling method, but it only requires two processing steps

 [test@localhost test]$ file test.tar.xz 
test.tar.xz: XZ compressed data
[test@localhost test]$ xz -d test.tar.xz 
[test@localhost test]$ ls
test.tar
[test@localhost test]$ file test.tar 
test.tar: POSIX tar archive (GNU)
[test@localhost test]$ tar -xvf test.tar 
test/
test/main.c
test/b.c
test/main
test/libb.so
test/a.c
test/liba.so
[test@localhost test]$ 

@cwendling How do you view this issue? Do we revert back to the past?

cwendling commented 1 year ago

Hum, I don't know… I assumed the intent of #490 was good, but maybe it isn't if libmagic doesn't tell us about those subtypes. Does the g_content_type thing work better? If so, that could be the first (or only?) magic-aware handler (is it not just as capable as libmagic?).

cwendling commented 1 year ago

@sinha-toyeesh what was the actual intent behind #490? Did you have something you needed fixing, or was this for the same of making it seemingly more consistent?

Anyhow, this would require more testing but I'd personally try g_content_type first. AFAICT it does both filename and magic matching, which is kind of the best of both world. And it does the right thing for e.g. .tar.xz:

$ gio info -a standard::content-type,standard::fast-content-type foobar.tar.xz
[…]
attributes :
  standard::content-type: application/x-xz-compressed-tar
  standard::fast-content-type: application/x-xz-compressed-tar

This said, I really don't know much about Engrampa's internals, nor history behind the filename/gcontenttype/libmagic trio, so I'm not necessarily the best one to give an opinion.


EDIT:

FWIW, here's the same file under another name:

$ file -i a.tar.xz
a.tar.xz: application/x-xz; charset=binary
$ gio info -a standard::content-type,standard::fast-content-type a.tar.xz
[…]
attributes:
  standard::content-type: application/x-xz-compressed-tar
  standard::fast-content-type: application/x-xz-compressed-tar
$ mv a.tar.xz a.b
$ file -i a.b
a.b: application/x-xz; charset=binary
$ gio info -a standard::content-type,standard::fast-content-type a.b
[…]
attributes:
  standard::content-type: application/x-xz
  standard::fast-content-type: application/octet-stream
$ mv a.b a.txz
$ file -i a.txz 
a.txz: application/x-xz; charset=binary
$ gio info -a standard::content-type,standard::fast-content-type a.txz 
[…]
attributes:
  standard::content-type: application/x-xz-compressed-tar
  standard::fast-content-type: application/x-xz-compressed-tar
$ mv a.txz a.c
$ file -i a.c
a.c: application/x-xz; charset=binary
$ gio info -a standard::content-type,standard::fast-content-type a.c
[…]
attributes:
  standard::content-type: text/x-csrc
  standard::fast-content-type: text/x-csrc
$ mv a.c a.pdf
$ file -i a.pdf
a.pdf: application/x-xz; charset=binary
$ gio info -a standard::content-type,standard::fast-content-type a.pdf
[…]
attributes:
  standard::content-type: application/pdf
  standard::fast-content-type: application/pdf
$ mv a.pdf a
$ file -i a
a: application/x-xz; charset=binary
$ gio info -a standard::content-type,standard::fast-content-type a
[…]
attributes:
  standard::content-type: application/x-xz
  standard::fast-content-type: application/octet-stream

So what we can see in this example is that

sinha-toyeesh commented 1 year ago

I would say it say it was a bit of both as mentioned in the original quoted #206, the solution to which, I assumed was an issue of inconsistent use of libmagic, however that was an incomplete picture, as you have elaborated in your example.