iipc / jwarc

Java library for reading and writing WARC files with a typed API
Apache License 2.0
48 stars 8 forks source link

lenient http parser: allow empty field names and invalid characters #51

Closed ato closed 4 years ago

ato commented 4 years ago

Allows invalid but seen in the wild header fields like:

: empty field name
field name with spaces: 1
<strong class="why">header</strong>: other random characters

Note that leading whitespace is still treated as folding. This is a single field:

field: first part
      folded: second part

Fixes #50