Closed nice-redbull closed 1 year ago
https://data.commoncrawl.org/crawl-data/CC-NEWS/2020/09/CC-NEWS-20200921024254-00130.warc.gz invalid HTTP message at byte position 6: HTTP/2<-- HERE --> 200 \r\nserver: Apache\r\nx-gen-mode: full\r...
multiple errors from files this year/month
See commoncrawl/news-crawl#42 - http/2 was enabled by a security upgrade of JDK and the HTTP headers were written as they were "stringified" by the protocol layers.
https://data.commoncrawl.org/crawl-data/CC-NEWS/2020/09/CC-NEWS-20200921024254-00130.warc.gz invalid HTTP message at byte position 6: HTTP/2<-- HERE --> 200 \r\nserver: Apache\r\nx-gen-mode: full\r...
multiple errors from files this year/month