mafintosh / tar-stream

tar-stream is a streaming tar parser and generator.
MIT License
406 stars 92 forks source link

Improve gnu/oldgnu format support #105

Closed justfalter closed 5 years ago

justfalter commented 5 years ago

This PR changes tar-stream so that it explicitly checks the header type via the magic and version fields in order to ensure proper handling of the record. An exception is thrown if an unsupported header type is encountered. Supported header types are posix and gnu/oldgnu. To the best of my knowledge, these are already implicitly supported in tar-stream.

I was motivated to make this change after finding that tar-stream failed to properly extract files from a gnu-formatted tarball (see oldgnu_header, below). Offset 345 for posix tarballs (as handled by tar-stream) is the name prefix (posix_header.prefix), where gnu-formatted headers have the incremental access time (oldgnu_header.atime) at that offset. When extracting gnu-formatted binaries with oldgnu_header.atime set, a file that should be extracted to the root would be placed in [octal atime value]/[filename] (ex: 13455654047/foo.txt). By detecting that we are processing a gnu-formatted header, tar-stream will no longer mistake the access time field for a prefix.

From the gnu tar documentation:

struct posix_header
{                              /* byte offset */
  char name[100];               /*   0 */
  char mode[8];                 /* 100 */
  char uid[8];                  /* 108 */
  char gid[8];                  /* 116 */
  char size[12];                /* 124 */
  char mtime[12];               /* 136 */
  char chksum[8];               /* 148 */
  char typeflag;                /* 156 */
  char linkname[100];           /* 157 */
  char magic[6];                /* 257 */
  char version[2];              /* 263 */
  char uname[32];               /* 265 */
  char gname[32];               /* 297 */
  char devmajor[8];             /* 329 */
  char devminor[8];             /* 337 */
  char prefix[155];             /* 345 */
                                /* 500 */
};

/* The old GNU format header conflicts with POSIX format in such a way that
   POSIX archives may fool old GNU tar's, and POSIX tar's might well be
   fooled by old GNU tar archives.  An old GNU format header uses the space
   used by the prefix field in a POSIX header, and cumulates information
   normally found in a GNU extra header.  With an old GNU tar header, we
   never see any POSIX header nor GNU extra header.  Supplementary sparse
   headers are allowed, however.  */

struct oldgnu_header
{                              /* byte offset */
  char unused_pad1[345];        /*   0 */
  char atime[12];               /* 345 Incr. archive: atime of the file */
  char ctime[12];               /* 357 Incr. archive: ctime of the file */
  char offset[12];              /* 369 Multivolume archive: the offset of
                                   the start of this volume */
  char longnames[4];            /* 381 Not used */
  char unused_pad2;             /* 385 */
  struct sparse sp[SPARSES_IN_OLDGNU_HEADER];
                                /* 386 */
  char isextended;              /* 482 Sparse file: Extension sparse header
                                   follows */
  char realsize[12];            /* 483 Sparse file: Real size*/
                                /* 495 */
};
justfalter commented 5 years ago

@mafintosh Any thoughts?

mafintosh commented 5 years ago

Great work, Released as 2.1.0