mediacloud / news-search-api

Internal API server that offers search access to the Media Cloud Online News Archive (in Elasticsearch).
https://mediacloud.org
GNU Affero General Public License v3.0
1 stars 3 forks source link

remove `sort_field` option? #64

Closed rahulbot closed 2 months ago

rahulbot commented 6 months ago

Right now you can sort by publication_date or indexed_date. @philbudne suggests on slack that "I'm having trouble imagining how any sort order other than indexed_date alone could work (unless there is a hidden, monotonic id)... Both _id and published_date orderings of documents are unstable: new documents can appear at any point in the ordering at any time, no?".

So should we remove the sort_field="publication_date" option since it doesn't return stable results? Someone using results for analysis can always re-sort things themselves once they have data... this is about making sure we return all matching results while paging through stories.

pgulley commented 6 months ago

I'm still confused as to why this only appeared with the recent change to the publication_date schema- but I'm not opposed to the removal. I feel like I would intuitively expect things to be sorted by publication_date, given that it's a part of the query- so we could even optionally re-sort in the api-client or in providers ourselves, to save potential consumer confusion