guzba / zippy

Pure Nim implementation of deflate, zlib, gzip and zip.
MIT License
246 stars 29 forks source link

extractAll() for Tarballs in tarPath not extracting files #6

Closed theAkito closed 3 years ago

theAkito commented 3 years ago

I want to extract a Tarball located at tarPath to tmpDir (not existing sub-directory). However, nothing gets extracted.

When I extract this Tarball with PeaZip, everything gets extracted perfectly and works fine. However, when I do it with this procedure

proc extractAll(tarPath, dest: string) {.raises: [IOError, ZippyError, OSError], tags: [
 ReadIOEffect, ReadDirEffect, ReadEnvEffect, WriteDirEffect, WriteIOEffect].}

the files never get extracted.

There is no exception, no message, nothing. No idea what is going on there.

The Tarball is named properly, suffixed with a .tar and works perfectly fine.

Update 1

After debugging tarball.nim I found out that inside the extractAll procedure, the loop after https://github.com/guzba/zippy/blob/bd020475497fd8657158c68147a0660badf01c7c/src/zippy/tarballs.nim#L200 NEVER gets entered. It just skips the entire rest of the procedure at this line. Watching open, this procedure seems to work fine (but perhaps is not adding the files properly to the tarball) and finds all filenames from the tarball.

Update 2

As expected, the provided Tarball() does not have contents, so nothing is extracted, as nothing is provided by open():

ref 0000000004913fb0 --> [contents = [data = [],
counter = 0,
first = -1,
last = -1]]

Getting the above when running echo tarball.repr here: https://github.com/guzba/zippy/blob/bd020475497fd8657158c68147a0660badf01c7c/src/zippy/tarballs.nim#L200

Update 3

Through further debugging open() my suspicion was confirmed: open() never fills the tarball.contents; they always remain empty.

Update 4

https://github.com/guzba/zippy/blob/bd020475497fd8657158c68147a0660badf01c7c/src/zippy/tarballs.nim#L109

The above line ensures that tarball.contents are only filled, when the typeFlag either equals 0 or 5. The problem is: typeFlag is always empty!

Update 5

https://github.com/guzba/zippy/blob/bd020475497fd8657158c68147a0660badf01c7c/src/zippy/tarballs.nim#L100

The above line should ensure, that a valid fileNamePrefix is given, if the header is specifying a ustar archive. The issue is, that fileNamePrefix always remains empty. because neither header[345 ..< 500].trim() nor header[345 ..< 500] returns any output. Not sure if this is required or optional. It however does not seem to be directly related to the bigger problem at hand, but it's worth noting.

Update 6

I found the mistake!

When using writeTarball(), the headers for each file are built manually: https://github.com/guzba/zippy/blob/bd020475497fd8657158c68147a0660badf01c7c/src/zippy/tarballs.nim#L136-L153

Now, when we look at the tar standard format, we see that the header byte 156 needs to contain char typeflag;. This typeflag is meant to be set here: https://github.com/guzba/zippy/blob/bd020475497fd8657158c68147a0660badf01c7c/src/zippy/tarballs.nim#L146

Now in this case, the number (from ord()) does not get converted to a char; it returns nothing. That's why the typeFlag is missing https://github.com/guzba/zippy/blob/bd020475497fd8657158c68147a0660badf01c7c/src/zippy/tarballs.nim#L99 and that is why the files never are put into the tarball.contents https://github.com/guzba/zippy/blob/bd020475497fd8657158c68147a0660badf01c7c/src/zippy/tarballs.nim#L109-L114 and that is why the Tarball() is never actually extracted: https://github.com/guzba/zippy/blob/bd020475497fd8657158c68147a0660badf01c7c/src/zippy/tarballs.nim#L201-L220

guzba commented 3 years ago

Picking up where we left off over here: https://github.com/guzba/zippy/commit/9a0482b08b01c32104e2d500e81f03f56c564f96

Before I merge a fix in for this issue, I'd like to have a repro so I can see it, keep it in mind as I work, and also to add a test for it, etc. Currently I cannot repro the issue but I believe it is real.

To start off, if you don't mind, could you checkout to master of guzba/zippy (head at 1f9fa490011b01332cfe99cc10bedfd6a174db10), cd to the project base dir, and run nim c --gc:arc -r .\tests\test_tarballs.nim (importantly, use --gc:arc), the test should pass. Then uncomment the block starting at line 33 through 50 so it runs in addition to the tests above it in the file. Again with --gc:arc, the test should pass without any changes (it does for me at least).

Does this also pass for you without changes or do you run into the empty Tarball issue?

guzba commented 3 years ago

My current theory is that the issue is typeFlag can be '\0' in addition to '0' for normal files (per https://www.gnu.org/software/tar/manual/html_node/Standard.html), but that I currently do not include '\0' here:

https://github.com/guzba/zippy/blob/bd020475497fd8657158c68147a0660badf01c7c/src/zippy/tarballs.nim#L109

So I'm thinking this may be the fix:

if typeFlag == '0' or typeFlag == '\0':
  tarball.contents[(fileNamePrefix / fileName).toUnixPath()] =
    TarballEntry(
      kind: ekNormalFile,
      contents: data[pos ..< pos + fileSize]
    )
elif typeFlag == '5':
  tarball.contents[(fileNamePrefix / fileName).toUnixPath()] =
    TarballEntry(
      kind: ekDirectory
    )

This would happen for tarballs created by something other than Zippy since I should always be using '0' for files.

guzba commented 3 years ago

I have merged and released the above change to address this issue a while back. Feel free to re-open if this issue persists.