Closed rahulbot closed 2 months ago
In short, this fix is necessary, but not sufficient.
More detail: I dug into the fix and understand why it isn't working. Right now the caching is done by the mc-provider using function-name, method args, and method kwargs. This is smart for cross-platform search. However, for both Media Cloud and Wayback Machine providers the various methods call the same function under the hood... so the caching isn't speeind things up because providers doesn't know that (for instance) count
and sample
are both calling the same thing under the hood. I'll consider alternatives and move issue to mc-providers.
Just-pushed changes (cache-related) make this way faster for most queries.
We need to start paying attention to the performance of our search system more closely. A first item I was thinking about is how I think (1) total attention, (2) attention over time, (3) language, (4) domains, (5) TLDs, and (6) sample stories right now are all being served by the same news-search-API endpoint under the hood. I think each of these ends up calling the
overview
query endpoint. Evidence: see news-search-api source and the number of times_overview_query
is called in themediacloud-search-api
client.We are caching the results in Django, but when a user hit search I think it's firing off ~6(!) requests in parallel from the browser->Django->ES that are all asking for those
overview
results at the same time, and it hasn't been cached yet the first time they search. I think this means that each user generated query from the website is causing way more work than it needs to.Potential fixes (if I'm right):