netarchivesuite / solrwayback

A search interface and wayback machine for the UKWA Solr based warc-indexer framework.
Apache License 2.0
100 stars 21 forks source link

Experimental test of speeding up WARC-indexer #377

Open thomasegense opened 1 year ago

thomasegense commented 1 year ago

It is worth testing how much speed up is gained by not recalculating SHA-1 hash and trust the WARC-header instead. Notice for old ARC files, we still have to calculate the hash.

tokee commented 1 year ago

This issue should be moved to the webarchive-discovery project.