Open leewesleyv opened 1 month ago
Great idea. I would say this is ok to leave this for after the package has been published.
When you want to crawl the resulting WACZ (containing new resources), you probably want to crawl it together with the other WACZ (containing older resources). And if the old WACZ also was crawled as an 'update' to a previous one, you need to specify all of them when crawling it.
I think creating a WACZ manifest could help with this, so you can reference one file to re-crawl. Its specification is a work-in-progress, but a tool like replayweb.page already supports it afaik - see https://github.com/webrecorder/specs/issues/112 for the spec in progress.
When using the downloader middleware and the request is not found, request the live resource. Add a setting or something alike that we can use the control this behaviour.