mediacloud / news-search-api

Internal API server that offers search access to the Media Cloud Online News Archive (in Elasticsearch).
https://mediacloud.org
GNU Affero General Public License v3.0
1 stars 3 forks source link

ui needs hard-wired list of indices #9

Closed philbudne closed 7 months ago

philbudne commented 10 months ago

Should really query the news-search-api?

If news-search-api doesn't have a way to retrieve index names it could do so using:

esconnection.CatClient.indices(index="mediacloud_searchtext*",s="index")

Unless cached, this would happen each time the ui page loads?

philbudne commented 10 months ago

I don't think this is a must fix now issue (and note, I'm NOT usually "a quick bandage is good enough" person)!

The current configuration is good until 90 days from the end of 2024 (when the 2025 index will be created).

Considerations:

  1. Both ui.py and api.py need the data; ui.py seems to be written as a "front end" to api.py, so directly accessing ES seems like a layering violation to me, and perhaps api.py needs to expose the index list to ui.py via an api call?
  2. Getting the list of indexes at startup isn't a solution: the processes are likely to be running when a new index is created; requiring a manual restart is less than optimal;
  3. Supplying a default index list at startup is also problematic if ES is not available.
  4. But we don't want to query ES every time api.py services a request: we need to cache the data.
  5. Each file should have its own logger; borrowing another file's logger will lead to confusion!
  6. See https://github.com/mediacloud/news-search-api/issues/11 for MORE cases where ui and api need to agree on configuration!
philbudne commented 10 months ago
  1. The ES index prefix mediacloud_search_text is passed as configuration ELASTICSEARCH_INDEX_NAME_PREFIX to https://github.com/mediacloud/story-indexer/blob/main/indexer/workers/importer.py so it shouldn't be wired into the code