snoyberg / tar-conduit

Conduit based tar extraction mechanism
MIT License
8 stars 9 forks source link

Unable to parse some packages from Hackage #17

Open snoyberg opened 6 years ago

snoyberg commented 6 years ago

As a simple repro:

#!/usr/bin/env stack
-- stack --resolver lts-11.10 script
import Conduit
import Data.Conduit.Tar
import Network.HTTP.Client (parseUrlThrow)
import Network.HTTP.Simple
import Data.Conduit.Zlib (ungzip)
main :: IO ()
main = do
  req <- parseUrlThrow "https://hackage.haskell.org/package/libpq-0.3.tar.gz"
  withResponse req $ \res -> runConduit
      $ getResponseBody res
     .| ungzip
     .| untar (lift . print)

Results in:

Main.hs: UnexpectedPayload 512

Use case: I'm trying to unpack all of Hackage into an SQLite database in snoyberg/pantry.

snoyberg commented 6 years ago

Blocks snoyberg/pantry#1

lehins commented 6 years ago

This issue is caused by pax format and improper discarding of unsupported headers. The very first header in the archive is of type g and it describes a special file: pax_global_header with payload containing info specific to that format. Since it is not supported the header is simply discarded, but the payload following the header isn't, thus causing a UnexpectedPayload error. Solution would be for unsupported headers to also look at the size and if it's not zero to discard the associated payload.

lehins commented 6 years ago

I suspect this issue would also disappear if #13 would get implemented, but adding proper discarding described in above comment is still a suggested solution for unknown/custom headers.