internetarchive / dweb-mirror

Offline Internet Archive project
https://www-dweb-mirror.dev.archive.org/
GNU Affero General Public License v3.0
261 stars 27 forks source link

Avoiding gateway dependency: collection0title refactor #283

Closed mitra42 closed 4 years ago

mitra42 commented 4 years ago

Part of #242 (dependencies on Dweb) As sent to Ximm today ... Hi, while back we discussed the “title of the first collection” field missing in search results that means if using the APIs we have to follow up each search with a second search … e.g. if I do https://archive.org/advancedsearch.php?output=json&q=bananas&rows=30&page=1&sort[]=-downloads&and[]=&save=yes&fl=identifier%2Ctitle%2Ccollection%2Cmediatype%2Cdownloads%2Ccreator%2Cnum_reviews%2Cpublicdate%2Citem_count%2Cloans__status__status I then have to take all the results and do a query like https://archive.org/advancedsearch.php?output=json&q=identifier:(prelinger%20OR%20internetarcade%20movies)&rows=30&page=1&sort[]=-downloads&and[]=&save=yes&fl=identifier%2Ctitle%2Ccollection%2Cmediatype%2Cdownloads%2Ccreator%2Cnum_reviews%2Cpublicdate%2Citem_count%2Cloans__status__status To get the titles of each first-listed collection in order to paint the results. You suggested that this field should probably be stored in the search documents I’ve got to do a refactor to move this from the Python on the gateway to the client, and while I have code that can do this so its not a huge deal from the coding perspective it does make displaying collections or search results slower and less efficient, Before I do this refactor, I just wondered if this was anywhere on your roadmap ?

mitra42 commented 4 years ago

On Nov 26 ximm said: I can put that in the next index build which will be in the next few weeks

mitra42 commented 4 years ago
mitra42 commented 4 years ago

Give up on this, we have expand code that does a second query