mediacloud / mediacloud-news-client

An internal client library to access the new Mediacloud news archive search.
Apache License 2.0
0 stars 1 forks source link

modify status_code check to be more stringent #3

Open rahulbot opened 9 months ago

rahulbot commented 9 months ago

I'm tracking down a weird bug on a big query, running search locally connected via tunnel to prod ES index. I'm seeing a log message that seems to indicate the query is being chunked into lots of chunks, and then some of them are returning 404 but not throwing a real error.

I don't know from where, but log message I see look like this:

http://localhost:8010 "POST /v1/mediacloud_searchtext/search/overview HTTP/1.1" 200 194840 http://localhost:8010 "POST /v1/mediacloud_searchtext/search/overview HTTP/1.1" 200 194840 http://localhost:8010 "POST /v1/mediacloud_searchtext*/search/overview HTTP/1.1" 404 30

Note the 404 on the last one. While tracking this down I noted: https://github.com/mediacloud/mediacloud-news-client/blob/e153023be15648852c0cff3e2c59399e8b99ec8a/mcnews/searchapi.py#L156-L158

Why is this >=500? When I change it to !=200 locally I get a much more helpful error. I'm worried this could be swallowing 40x errors and returning incorrect data due to the query chunking in place.

pgulley commented 9 months ago

This is a remnant of when the API lived in IA's servers- I agree, there's a better pattern. I'll address

pgulley commented 9 months ago

Indeed, this change surfaced a new error in the testing circuit to chase down

pgulley commented 9 months ago

Ok, so- the 404 errors are returned by the api when the query returns nothing- we're using 404 parsed as 'not found'- that was why we didn't attempt to catch them earlier. I believe that the behavior you described here is probably actually because those specific chunks happened to return nothing- I would expect any other issue to surface as a 500 error.

At the very least, changing this means putting some other verbiage in place to catch 404 specifically, since that's how the api returns "0"- but really I'm thinking that supressing 404 should be the intended behavior here.

rahulbot commented 9 months ago

Yeah. Super confusing to me. Anyway perhaps safest to do list of valid response codes and raise error if not in list?

rahulbot commented 9 months ago

Note: non-urgent because I was wrong and the current behavior is not a bug. Proposed enhancement is to have a safe-list of valid response codes (if r.status_code not in [200, 404]:?).