yacy / yacy_search_server

Distributed Peer-to-Peer Web Search Engine and Intranet Search Appliance
http://yacy.net
Other
3.43k stars 429 forks source link

Support importing from different warc compressions #430

Open raspher opened 2 years ago

raspher commented 2 years ago

Archiveteam is actively using various warc compressions like megawarc.zst or warc.xz for example. What do you think about it, should be implemented or not? Which formats should be supported?

https://archive.org/details/archiveteam?sort=-publicdate

Orbiter commented 2 years ago

why not, this more or less only requires the existence of java libraries for those compressions, which I can find here i.e.:

zstd:

xz:

Is this complete? what else formats are preferred by Archiveteam?