mediacloud / news-search-api

Internal API server that offers search access to the Media Cloud Online News Archive (in Elasticsearch).
https://mediacloud.org
GNU Affero General Public License v3.0
1 stars 3 forks source link

split `overview` to make more parallel queries? #73

Open rahulbot opened 2 months ago

rahulbot commented 2 months ago

Right now the overview endpoint does a lot of lifting. It generates daily counts, top langs, top domains, top TLD, and total count all in one query into ES. Would it be fast in real-world system performance to split these apart into separate endpoints?

Our web server UI and architecture assumes these can all be fetched in parallel. In fact to avoid duplicative queries right now when someone clicks "search" in the web UI we wait for the first results to come back and be cached so that subsequent calls hit the cache (since they all call overview under the hood).

The end result of changing approaches would mean more parallel ES queries but each would do less work. We should figure out if there is some way to test if this would help improve user-facing search performance without making the requisite changes all up and down the stack..

philbudne commented 2 months ago

I have a variety of thoughts:

Pushing all parallelism out to the JS App in the browser can also mean different user experiences based on the user's browser: different revisions of different browsers may have very different concurrent connection limits.

pgulley commented 3 weeks ago

I've begun the work to break this out in #89