netarchivesuite / solrwayback

A search interface and wayback machine for the UKWA Solr based warc-indexer framework.
Apache License 2.0
100 stars 21 forks source link

SolrWayback should expose the CDX-API #169

Closed tokee closed 1 year ago

tokee commented 3 years ago

We have all the information, why not let SolrWayback act as a CDX-server, så that there is not a need for a separate index? (suggested by Yves Maurer at a webinar)

thomasegense commented 1 year ago

I will close this. To put it short, noone will benefit from it.

It will not improve playback by using PyWb instead of SolrWayback by exposing a primitive CDX-server that PyWB can call. As long prefix search and POST requests can not be handled with the currect version of WARC-Indexer+Solr, there will be no playback (or very little) playback improvement. Also old WARC-files harvested with Heritrix does save the post-data in the request.

Use Outback-CDX + PyWb to have playback work for the most complated SOME sites such as Facebook. This also require the harvest was done with WebRecorder technology.