harvard-lil / js-wacz

JavaScript module and CLI tool for working with web archive data using the WACZ format specification.
MIT License
11 stars 4 forks source link

Add option to use existing CDXJ rather than indexing from WARCs #89

Closed tw4l closed 4 months ago

tw4l commented 4 months ago

Fixes #88

All tests and linting are passing, please let me know if you'd like to see any changes!

Since not indexing from WARCs means losing another way to detect pages, the new --cdxj option must be used in combination with --pages, and I've added a validator to fail early if this is not the case.

tw4l commented 4 months ago

This is perfect @tw4l - Thank you very much for a great PR.

I can merge and publish whenever you're ready :)

Thanks so much @matteocargnelutti ! Should be ready now :)