issues
search
iipc
/
jwarc
Java library for reading and writing WARC files with a typed API
Apache License 2.0
48
stars
9
forks
source link
Unpack as files
#5
Open
ato
opened
5 years ago
ato
commented
5 years ago
Lots of tricky details:
How do we map URLs to file paths?
What if a WARC contains several versions of the same URL?
How do we handle the file/directory name clashes?
Do we make files for metadata, headers and request payloads too?
Lots of tricky details: