netarchivesuite / solrwayback

A search interface and wayback machine for the UKWA Solr based warc-indexer framework.
Apache License 2.0
100 stars 21 forks source link

Export full WARC from PWID #172

Open tokee opened 3 years ago

tokee commented 3 years ago

The Twitter API has hydration that turns message IDs into full tweets. Likewise SolrWayback should be able to take a PWID (by upload) and export a full WARC with the resources.

thomasegense commented 3 years ago

I will implement this the backend part of this one. For very large PWID I maybe have to split into into several Solr calls due to the 1000 maximum boolean queries and have the WARC-streaming continue. But as you only can get PWID for single pages, this is not a problem yet.

When we implement get PWID for resultset, we can extend to large scale.