mediacloud / news-search-api

Internal API server that offers search access to the Media Cloud Online News Archive (in Elasticsearch).
https://mediacloud.org
GNU Affero General Public License v3.0
1 stars 3 forks source link

How to query for specific articles #48

Closed pgulley closed 6 months ago

pgulley commented 8 months ago

Right now we have a hanging implementation from when we were querying against the wayback archive, where we look for an article with a given id. A glance at the story indexer leads me to believe we use a hash of the url as a uid when we're inserting the documents, but copying that code to manually generate that value as an ID on the article/id endpoint doesn't work (fastapi throws a fit about the encoding?)

Regardless, it seems odd to have to do that encoding out in the real world, rather than in the api:

If/when we get 197 merged, we can just re-use this functionality to generate that hash in the api, and accept the url as an input.

related to 198

pgulley commented 7 months ago

Not much to do here, pending the database migration in story-indexer- consider this on hold.

rahulbot commented 7 months ago

Was poking around and saw the ids query, which I wasn't aware of. Can you try using this to retrieve by document id? https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-ids-query.html

pgulley commented 7 months ago

We use this already! The issue is that we're in the process of changing the id field- so it doesn't make sense I think to build an endpoint for the current id when we're going to rip it up in (presumably) just a week or so

rahulbot commented 6 months ago

Looks like this is done, closing.