siranwu / theunarchiver

Automatically exported from code.google.com/p/theunarchiver
Other
0 stars 0 forks source link

The Archive Browser does not honor original filename field in a GZIP header #802

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
The Archive Browser does not honor original filename field in a GZIP header.

RFC 1952 (https://www.ietf.org/rfc/rfc1952.txt) includes a field for the 
original filename:

    (if FLG.FNAME set)

        +=========================================+
        |...original file name, zero-terminated...| (more-->)
        +=========================================+

My testing shows the original filename is usually populated by software 
creating the archive. (I have not come across a counter example, but I 
understand there's probably a counter example somewhere).

Expected behavior:

When the implicit filename (the archive name less the 'gz' extension) differs 
from the original filename (embedded in the header), I would like a choice to 
extract under the original filename.

UI behavior:

The original filename (embedded in the header) is not displayed in the UI. The 
UI will display other header elements, like comments. See attached.

Steps to reproduce:

    $ echo ABCDEFG > test.txt
    $ gzip test.txt

    $ ls test.txt
    ls: test.txt: No such file or directory

    $ ls test.txt.gz
    test.txt.gz

    # At this point, examine test.txt.gz under a HexEditor.
    # You will see the original filename is embedded in the header.

    $ mv test.txt.gz test.gz

    # Now, use the Archive Browser to decompress the archive.
    # The archive is decompressed to 'test', and not 'test.txt'.

-----

I'm working on OX S 10.8.5 x64 (fully patched) with The Archive Browser 1.9.1 
(fully patched).

-----

My apologies if this is the wrong forum for The Archive Browser issues. Archive 
Browser -> About lists this project as the point of contact.

Original issue reported on code.google.com by noloa...@gmail.com on 3 Jan 2015 at 11:03

Attachments:

GoogleCodeExporter commented 9 years ago
If interested, here's a couple of Stack Exchange questions/answers that used 
The Unarchiver and The Archive Browser for testing.

 * How to add filename to archive if compressing using Gzip class?, http://stackoverflow.com/q/27739140/608639

 * Is Gzip supposed to honor original filename during decompress?, http://superuser.com/q/859785/173513

The incorrect results muddied the waters while investigating the issue on the 
Stack Exchange questions, and caused us to ask "... am I seeing a bug in three 
different programs?".

Original comment by noloa...@gmail.com on 3 Jan 2015 at 11:08

GoogleCodeExporter commented 9 years ago
This is intentional. Too many gzip filenames apparently have invalid or 
incorrect filenames stored, which breaks decompression of embedded tar 
archives. There used to be support, but it caused too many issues.

Original comment by paracel...@gmail.com on 4 Jan 2015 at 8:32

GoogleCodeExporter commented 9 years ago
> Too many gzip filenames apparently have invalid or incorrect filenames...

Thanks for that. I was not aware the problem was that widespread.

Related, is the following be sufficient in my software? According to RFC 1952, 
the character set is Latin-1 of ISO/IEC 8859-1. The collection of 191 valid 
characters came from 
http://en.wikipedia.org/wiki/ISO/IEC_8859-1#Codepage_layout.

+void Gzip::SetFilename(const std::string& filename, bool throwOnEncodingError)
+{
+    if(throwOnEncodingError)
+    {
+        for(size_t i = 0; i < filename.length(); i++) {
+            const char c = filename[i];
+            if( !(c >= 32 && c <= 126) && !(c >= 160 && c <= 255))
+                throw InvalidDataFormat("The filename is not ISO 8559-1 
encoded");
+        }
+    }
+    
+    m_filename = filename;
+}

And:

+const std::string& Gunzip::GetFilename(bool throwOnEncodingError) const
+{
+    if(throwOnEncodingError)
+    {
+        for(size_t i = 0; i < m_filename.length(); i++) {
+            const char c = m_filename[i];
+            if( !(c >= 32 && c <= 126) && !(c >= 160 && c <= 255))
+                throw InvalidDataFormat("The filename is not ISO 8559-1 
encoded");
+        }
+    }
+    
+    return m_filename;
+}

If I get far enough to call Gunzip::GetFilename, then the archive is good (and 
decompressed) but the original filename could be bad.

The obvious strategy is (1) attempt to use the original filename, and (2) 
fallback to something else on failure.

Original comment by noloa...@gmail.com on 5 Jan 2015 at 7:03