webrecorder / replayweb.page

Serverless replay of web archives directly in the browser
https://replayweb.page
GNU Affero General Public License v3.0
709 stars 58 forks source link

Support load of multiple WARC files #91

Open ivbeg opened 2 years ago

ivbeg commented 2 years ago

Some crawlers could create multiple WARC files, it's importand if we had to upload WARC files to storages with limitation on single file size. I have a lot of archives websites splitted to 5-50 5GB WARC files each one. Is it possible to add to Reply Web.Page ability to open more than one file at once ?

ikreymer commented 2 years ago

There is a preliminary implementation for loading a json 'manifest' which contains a list of WACZ files. Currently, support is planned for just multiple WACZ, because it is easier to load many at once due to random access. With a list of WARCs, would need to load each one to index it, but maybe that should still be supported. Can update here when there is more progress on this!