netarchivesuite / solrwayback

A search interface and wayback machine for the UKWA Solr based warc-indexer framework.
Apache License 2.0
100 stars 21 forks source link

Export does not support image & group search #233

Open tokee opened 2 years ago

tokee commented 2 years ago

Reported by Sara Aubry, https://www.bnf.fr/fr https://webcorpora.hypotheses.org/

Exporting a WARC when image search or grouped search is enabled produces the same result as an export of a plain search for the same query. The user would normally expect the export to correspond to the visible search result, so export should conform to that.

thomasegense commented 2 years ago

Also confirmed. Fixing it for the grouped search should not be that hard. Fixing it for image search is hard as this is piecesd together by many different solr queries and will take a new method the frontend must call. The image search is also not presented in the result-view. Disabling the export option for image search can be done quick,

thomasegense commented 2 years ago

Unfortunately Solr does not support result streaming using CursorMark for grouping queries., Solr Error "Can not use Grouping with cursorMark".

So this limits the export to what can be handled in a single Solr query. Using the pagination mechanics will not work past a few 1000 results in Solr.

My best 'fix' is having the GUI give an error message that grouping queries can not be exported.

thomasegense commented 2 years ago

Assigned to @jorntx Fix: When trying to export a result, if it is a group search, then give an error message instead. "Export is not currently supported for grouped search".

@tokee Has a solution but it takes quite some programming time. So the quick frontend patch is still preferable. Solution: 1) search without grouped. sort by url (maybe secondary time, or nearest time) 2) Use existing solr streaming 3) Since url are sorted, we skip all consecutive results from solr on same url. Until we meet a new url 4) Glue this into existing solrwayback streaming framework.

thomasegense commented 2 years ago

@tokee

thomasegense commented 1 year ago

It is ready now for image export. Takes a little of both frontend+backend work.