polydawn / repeatr

Repeatr: Reproducible, hermetic Computation. Provision containers from Content-Addressable snapshots; run using familiar containers (e.g. runc); store outputs in Content-Addressable form too! JSON API; connect your own pipelines! (Or, use github.com/polydawn/stellar for pipelines!)
https://repeatr.io
Apache License 2.0
68 stars 5 forks source link

Pointing output at single file produces malformed TAR archive #87

Closed tazjin closed 7 years ago

tazjin commented 7 years ago

With an output spec such as:

outputs:
  "executable":
    type: "tar"
    mount: "/go/src/github.com/tazjin/quinistry/quinistry"
    silo: "file://quinistry.tar.gz"

and that mount being the executable itself, we get a malformed TAR as the header is written like for a directory:

# tar xvf quinistry.tar.gz 
./
tar: Skipping to next header
tar: Exiting with failure status due to previous errors
# tar tvf quinistry.tar.gz 
drwxr-xr-x 1000/1000   6128052 2010-01-01 00:00 ./
warpfork commented 7 years ago

... that was a fun one.

The creation of a tar with clearly incorrect headers is now fixed in d6a13e2b1470caab28469ea94d729342f7763896 . Tar files produced when the mount path contains a file are now seen as correct by GNU tar, and will unpack correctly, producing a single file with the correct content.

That said, I'd still recommend always saving directories, even if they contain a single file, for consistency's sake. The result is much easier to work with. In working on fixing this issue, I spent a lot of time poking around the GNU tar command in various forms, researching what is likely to feel the most consistent with command-line use of that tool. Getting it to act comparably to repeatr when targeting dirs is pretty much a no-brainer; evolving those scripts to a generalized from that works "correctly" (or at all) for single files starts generating a surprisingly large amount of bash. If the volume of bash produced there is an admissible heuristic for "complex, inadvisable, and full of edge cases", then I'd say handling of tars of a single file with no directory metadata is (surprisingly) complex, inadvisable, and full of edge cases: best to simply avoid it.