sudipta1411 / jtar

Automatically exported from code.google.com/p/jtar
0 stars 0 forks source link

invalid entry headers for some tar files (e.g. from Apache) #1

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. download tar.gz files from here and use for unit tests later:
   http://artfiles.org/apache.org//commons/cli/binaries/commons-cli-1.2-bin.tar.gz
   http://apache.mirror.iphh.net//geronimo/3.0-M1/geronimo-jetty8-javaee6-3.0-M1-bin.tar.gz
2. create and execute unit test containing:
        FileInputStream fis = new FileInputStream(tarGzFile);
        TarInputStream tis = new TarInputStream(new GZIPInputStream(fis));
        TarEntry entry;
        while((entry = tis.getNextEntry()) != null)
        {
            if (entry.isDirectory()) continue;
            System.out.println(entry.getName());
        }

What is the expected output?
list of archive contents on console

What do you see instead?
java.io.IOException: Invalid entry header of size [154]; expected [512]
    at org.xeustechnologies.jtar.TarInputStream.getNextEntry(TarInputStream.java:125)
    at my.archive.TarContentsTest.testWithXeus(TarContentsTest.java:104)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    (...)
    at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)

What version of the product are you using? On what operating system?
jtar-1.0.2
oracle-jdk-6u22
Windows XP SP3 [5.1.2600]

Please provide any additional information below.

Original issue reported on code.google.com by joergbuc...@gmail.com on 18 Nov 2010 at 1:48

GoogleCodeExporter commented 9 years ago
Hi. First of all thanks for sharing this project.
Basically I opened this issue to get some clarification about this as I didn't 
find a discussion list or forum.
Background: I currently compare 5 different Java tar APIs and this one is the 
only one being strict about the header size.

Original comment by joergbuc...@gmail.com on 19 Nov 2010 at 10:06

GoogleCodeExporter commented 9 years ago
I went through the tar format and it states that "Each file is preceded by a 
512-byte header record." and "The header is padded with NUL bytes to make it 
fill a 512 byte block." here is the 
[http://en.wikipedia.org/wiki/Tar_%28file_format%29#Format_details link]. It 
seems like the tool that created the tar did not pad the header entry with NUL 
to make it 512.

Here is the result for apache commons-cli file

commons-cli-1.2/LICENSE.txt
commons-cli-1.2/NOTICE.txt
commons-cli-1.2/RELEASE-NOTES.txt
commons-cli-1.2/commons-cli-1.2-javadoc.jar
commons-cli-1.2/commons-cli-1.2-sources.jar
commons-cli-1.2/commons-cli-1.2.jar
commons-cli-1.2/apidocs/allclasses-frame.html
commons-cli-1.2/apidocs/allclasses-noframe.html
commons-cli-1.2/apidocs/constant-values.html
commons-cli-1.2/apidocs/deprecated-list.html
commons-cli-1.2/apidocs/help-doc.html
commons-cli-1.2/apidocs/index-all.html
commons-cli-1.2/apidocs/index.html
commons-cli-1.2/apidocs/options
commons-cli-1.2/apidocs/org/apache/commons/cli/AlreadySelectedException.html
*commons-cli-1.2/apidocs/org/apache/commons/cli/BasicParser.html*
Exception in thread "main" java.io.IOException: Invalid entry header of size 
[153]; expected [512]
    at org.xeustechnologies.jtar.TarInputStream.getNextEntry(TarInputStream.java:125)
    at TestJtar.main(TestJtar.java:13)

I also ran the test by creating a TGZ file using 7zip and then used JTar to 
list the entries, and it worked fine.

Now I will update JTar to pad the header itself with NUL, if it is not of 512 
bytes; I assume other APIs must be doing the same. But then it is an issue with 
the tool that created the tar file, with inconsistent header entries.

Original comment by xeus....@gmail.com on 19 Nov 2010 at 11:20

GoogleCodeExporter commented 9 years ago
Did not look into the sources of the other tar APIs, so I cannot tell you how 
they deal with this. Just for the record, I can provide you with the list of 
other APIs with tar support I'm currently comparing:
- http://commons.apache.org/compress/examples.html
- http://commons.apache.org/vfs/filesystems.html#Zip, Jar and Tar
- http://www.trustice.com/java/tar/
- http://svn.apache.org/viewvc/ant/core/trunk/src/main/org/apache/tools/tar/

Original comment by joergbuc...@gmail.com on 19 Nov 2010 at 11:47

GoogleCodeExporter commented 9 years ago
I agree, that the version of tar that Apache is using to offer their download 
packages obviously is definitely doing something different there.
The Wikipedia entry you referred does mention, that some old versions of tar 
use other values for padding and that the US tar format is said to be the most 
widely used, which in turn means that there other formats around, handling 
things differently.

Original comment by joergbuc...@gmail.com on 19 Nov 2010 at 11:57

GoogleCodeExporter commented 9 years ago
I'm afraid, I have not enough competence WRT tar to give you any advice as how 
to deal with aforementioned format clutter. All I can do is, give you feedback 
about my observations.

Original comment by joergbuc...@gmail.com on 19 Nov 2010 at 12:01

GoogleCodeExporter commented 9 years ago
I think I know what the problem is, most of the other APIs are using an 
internal buffer. But Jtar is relying on the Buffered streams. I will try to fix 
this issue soon.

Original comment by xeus....@gmail.com on 19 Nov 2010 at 2:14

GoogleCodeExporter commented 9 years ago
Thanks. I will update my tests once that change is available from Maven Central.

Original comment by joergbuc...@gmail.com on 19 Nov 2010 at 2:21

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
Fixed in release 1.0.3

Original comment by xeus....@gmail.com on 1 Mar 2011 at 11:01

GoogleCodeExporter commented 9 years ago
fixed in revision r27

Original comment by xeus....@gmail.com on 1 Mar 2011 at 12:49

GoogleCodeExporter commented 9 years ago
First of all, thanks for the new release.

The original exception does not occur.

Instead, when I use the new release against attached archive, I run into a 
FileNotFoundException when the following entry is processed:
test/test_parser.c

Only the beginning of the file is written up until here (see attachment for 
original file)
/* Licensed to the Apache Software Foundation (ASF) under one or more
 * c

java.io.FileNotFoundException: target\temp\xeus\medium\r agreed to in writing, 
software
 * distributed under the License is distributed on an "AS IS" BASIS (Die Syntax für den Dateinamen, Verzeichnisnamen oder die Datenträgerbezeichnung ist falsch)

Perhaps a character near the beginning of file test_parser.c is misinterpreted 
as end of file for some reason?

Original comment by joergbuc...@gmail.com on 1 Mar 2011 at 1:05

Attachments:

GoogleCodeExporter commented 9 years ago
BTW, when do you expect the new release to be deployed to Maven Central?

Original comment by joergbuc...@gmail.com on 1 Mar 2011 at 1:07

GoogleCodeExporter commented 9 years ago
Hi, I ran a test on this archive, but I don't get any error; and the 
file(attached) is also fully extracted. Could you please retest, the latest 
version is available in Maven central now.

Original comment by xeus....@gmail.com on 2 Mar 2011 at 10:03

Attachments:

GoogleCodeExporter commented 9 years ago
Verified FIX OK.

My tests run green now. Thanks!

(Just FYI ... had to reconfigure Maven settings since the UK mirror at 
uk.maven.org did not yet pick up changes from Maven Central.)

Original comment by joergbuc...@gmail.com on 2 Mar 2011 at 12:04

GoogleCodeExporter commented 9 years ago
I know this issue is fixed but I wanted to know what were the results of your 
comparison of the TAR libraries? 

I also wanted to know from the original author why they created JTAR and not 
used one of the existing libraries out there?  Was it performance reasons or 
some bug you ran into etc?

Original comment by mellowaredev on 6 May 2011 at 2:08

GoogleCodeExporter commented 9 years ago
will check with my employer, if it's OK to share comparison results

Original comment by joergbuc...@gmail.com on 10 May 2011 at 8:48

GoogleCodeExporter commented 9 years ago
Thanks for checking.  The reason I ask is under heavy CPU load I am seeing some 
weird anomalies processing with Commons Compress 1.1.   I looked at JTAR code 
and noticed it is doing a lot less complex logic with the byte stream so I am 
hoping it might fix my issue.

Original comment by mellowaredev on 17 May 2011 at 2:49

GoogleCodeExporter commented 9 years ago
I've yet to try truezip and get comparison updated

Original comment by joergbuc...@gmail.com on 17 May 2011 at 3:27