Closed dportabella closed 6 years ago
Sorry for delay, can you provide a sample WARC where this is happening? a CommonCrawl WARC? Is it only happening in python 2?
I only checked with python 2:
$ wget "https://commoncrawl.s3.amazonaws.com/crawl-data/CC-MAIN-2016-36/segments/1471982290442.1/warc/CC-MAIN-20160823195810-00000-ip-10-153-172-175.ec2.internal.warc.gz"
$ cdx-indexer warc.cdx CC-MAIN-20160823195810-00000-ip-10-153-172-175.ec2.internal.warc.gz
I just tried with python3, and it works.
Fixed in 2.0.4!