internetarchive / warc

Python library for reading and writing warc files
GNU General Public License v2.0
237 stars 114 forks source link

support reading older WARC versions #2

Open petri opened 12 years ago

petri commented 12 years ago

The reader code barfs on versions other than "WARC/1.0".

I have not seen anything on what are the differences between, say, 1.0, 0.18 and 0.17 (apart from the version stamp itself). If version 1.0 is otherwise equal to either or both of those, please allow reading them, or add a configuration variable that determines whether they are alllowed.

I could fork the code and add this feature, but I do not know of the differences. If someone can point me to spec on 1.0, I'd be happy to do it.