mate-desktop / engrampa

A file archiver for MATE
http://www.mate-desktop.org
GNU General Public License v2.0
113 stars 52 forks source link

Fixed libmagic bug #490

Closed sinha-toyeesh closed 1 year ago

sinha-toyeesh commented 1 year ago

Referring to issue https://github.com/mate-desktop/engrampa/issues/206 , Engrampa uses Libmagic only if extension is not known, otherwise it determines mime type from the file extension, even though the --enable-magic flag is used, along with libmagic support. This pull request adds an if condition to use libmagic if --enable-magic if used.

sinha-toyeesh commented 1 year ago

Could this be a local issue ? it seems to work on my machine. 2023-06-13_04-35

Verification would be appreciated

lukefromdc commented 1 year ago

I am using a local build, and have no way to further test this under other conditions. I cannot verify anything on a stock distro as I do not have a landline and do not have the bandwidth to download installers.

As it stands if this is merged I will have to disable libmagic support or revert this locally, as I cannot use Engrampa otherwise with this applied. If nobody else can duplicate this, I won't merge it but won't block merging either, and will patch it out of my local builds.

zhuyaliang commented 1 year ago

@lukefromdc Could you show me the error message

lukefromdc commented 1 year ago

This is what I got on clicking on any of those Changelog.Debian.gz files, or trying to open them from terminal in engrampa:

Engrampa_Error

zhuyaliang commented 1 year ago

@lukefromdc Can you help me run commands from the terminal command line and view the output

mkdir /tmp/test-engrampa 
cp -rf Changelog.Debian.gz /tmp/test-engrampa/
cd /tmp/test-engrampa
engrampa -h  Changelog.Debian.gz
cwendling commented 1 year ago

Actually I can reproduce @lukefromdc's problem. And looking into it is that libmagic returns application/gzip, whereas the builtin lookup returns application/x-gzip.

IMO, there's 2 things to do here:

  1. do as @zhuyaliang suggests in the patch comments, his changes work (and make sense: use the fallback but the other way around)
  2. as a real solution, stop using strcmp/sncasecmp to match MIME types, but either do it manually and accept CATEGORY/["x-"]NAME, or probably smarter, use g_content_type_equals().

I am not sure of all the consequences, but these changes fixes it for me:

diff --git a/src/file-utils.c b/src/file-utils.c
index 4983a7b..6d4f328 100644
--- a/src/file-utils.c
+++ b/src/file-utils.c
@@ -555,7 +555,7 @@ gboolean
 is_mime_type (const char *mime_type,
              const char *pattern)
 {
-       return (strcasecmp (mime_type, pattern) == 0);
+       return g_content_type_equals (mime_type, pattern);
 }

 const char*
diff --git a/src/fr-init.c b/src/fr-init.c
index c018bfa..c7a876d 100644
--- a/src/fr-init.c
+++ b/src/fr-init.c
@@ -292,7 +292,7 @@ fr_registered_command_get_capabilities (FrRegisteredCommand *reg_com,
                FrMimeTypeCap *cap;

                cap = g_ptr_array_index (reg_com->caps, i);
-               if (strcmp (mime_type, cap->mime_type) == 0)
+               if (is_mime_type (mime_type, cap->mime_type))
                        return cap->current_capabilities;
        }

Additionally, I'd think we'd need this (but it's not required to open the file)

diff --git a/src/fr-init.c b/src/fr-init.c
index c018bfa..c7a876d 100644
--- a/src/fr-init.c
+++ b/src/fr-init.c
@@ -312,7 +312,7 @@ fr_registered_command_get_potential_capabilities (FrRegisteredCommand *reg_com,
                FrMimeTypeCap *cap;

                cap = g_ptr_array_index (reg_com->caps, i);
-               if ((cap->mime_type != NULL) && (strcmp (mime_type, cap->mime_type) == 0))
+               if ((cap->mime_type != NULL) && (is_mime_type (mime_type, cap->mime_type)))
                        return cap->potential_capabilities;
        }

@@ -508,7 +508,7 @@ get_mime_type_index (const char *mime_type)
        int i;

        for (i = 0; mime_type_desc[i].mime_type != NULL; i++)
-               if (strcmp (mime_type_desc[i].mime_type, mime_type) == 0)
+               if (is_mime_type (mime_type_desc[i].mime_type, mime_type))
                        return i;
        return -1;
 }
sinha-toyeesh commented 1 year ago

Thanks @cwendling will take a look at this and update the PR

cwendling commented 1 year ago

@lukefromdc thanks for testing, but I think merging was a tad too fast here :) No need to revert though, but I think we might need to look at this a tad more carefully.

@sinha-toyeesh did you check all the implications of my suggested changes? As I mentioned, I'm not sure if this has side effects, like how it handles MIME aliases. I'm not sure it poses any problem, but as there is special handling for them, it might be worth checking a bit what's supposed to happen, and whether my suggestions interferes or not.

Also, we should really integrate @zhuyaliang's suggested change, because right now when using libmagic the logic is like if (!A && !B && !A) fail(), which doesn't make sense, A has no chance of working the second time. OTOH, the non-libmagic path does if (!A && !B && !C) fail(), which makes sense, and @zhuyaliang suggested just swapping A and C in the libmagic path, making it if (!C && !B && !A) fail().

lukefromdc commented 1 year ago

If this was a screwup, let's create a second PR with the cleanups in it and get this done right. How the binaries put out by distros work for end users is more important than how neat the changelogs look.

We could really use more testers here, as all too often PR's don't get reviewed or there are not enough testers to find the corner cases. Years ago, GNOME's development versions were distributed with the alpha versions of Ubuntu, I don't know if rolling releases like Debian Sid also distributed them. Ideally that would be resumed as it would make bugs getting all the way to releases in final versions of distros much less likely.

sinha-toyeesh commented 1 year ago

@cwendling I made the recommended changes as suggested by @zhuyaliang, as I too realised the fallacy of checking for the same thing twice in a single function, which seemed a tad bit wierd to me. Regarding testing, as I mentioned beforehand, I couldn't really recreate the bug mentioned by @lukefromdc, so it was out of the question on my end.

sinha-toyeesh commented 1 year ago

I wholeheartedly agree with @lukefromdc regarding both of his points, especially so on the topic of additional testers ;)

cwendling commented 1 year ago

@cwendling I made the recommended changes as suggested by @zhuyaliang, as I too realised the fallacy of checking for the same thing twice in a single function, which seemed a tad bit wierd to me.

Unfortunately no, you didn't. Look at the changes again, you actually made a blunder and only changed the non-libmagic code path making it never try filename, rather than making the libmagic one prefer magics. See #491 for what I believe is the intended behavior.

Regarding testing, as I mentioned beforehand, I couldn't really recreate the bug mentioned by @lukefromdc, so it was out of the question on my end.

The libmagic case maybe (although it probably ultimately depends on the version of the magic database, for which @lukefromdc and I probably have a more recent version); but the question I raised is the other implications of the g_content_type_equals() usage and is_mime_type() generalization. To evaluate this, it "only" requires carefully looking at the code and seeing what a non-match leads to, and from there try and test those case, and see if they are relevant -- or made obsolete by the fact they we already matched the MIME type.

Anyhow, the changes I suggested seemed fine to me, but then again I didn't spend enough time evaluating the consequences. If we have enough testing, maybe it's enough? we'll see :)

lukefromdc commented 1 year ago

The followup PR didn't create any problems on my end with a variety of files including the previous offenders