mediacloud / news-search-api

Internal API server that offers search access to the Media Cloud Online News Archive (in Elasticsearch).
https://mediacloud.org
GNU Affero General Public License v3.0
1 stars 3 forks source link

fix ES index name prefix #70

Closed thepsalmist closed 2 months ago

thepsalmist commented 2 months ago

Bugfix since the ELASTICSEARCH_INDEX_NAME_PREFIX="mc_search-*" exposed an index with the pattern mc-search-*-* that throws errors on search , not a valid index pattern remove INDEXES env variable, no longer used

kilemensi commented 2 months ago

Thanks for debugging this @thepsalmist. Will wait to hear from others but:

  1. I think this should be a configurable environment variable. Updating alias name shouldn't require code change.
  2. This mean both ELASTICSEARCH_INDEX_NAME_PREFIX and INDEXES need to be "mc_search". Are these two settings used differently in the code? Do we need both or is one of them just a legacy setting from the pre-alias days?
thepsalmist commented 2 months ago

Thanks for debugging this @thepsalmist. Will wait to hear from others but:

  1. I think this should be a configurable environment variable. Updating alias name shouldn't require code change.
  2. This mean both ELASTICSEARCH_INDEX_NAME_PREFIX and INDEXES need to be "mc_search". Are these two settings used differently in the code? Do we need both or is one of them just a legacy setting from the pre-alias days?

INDEXES is obsolete as per this

philbudne commented 2 months ago

I wouldn't have thought it was meaningful to have it be a variable:

I thought we had decided it was OK to wire the index prefix into the story-indexer repo (ie; in the template files, and the importer)

If that's the case, where does another index come from?

kilemensi commented 2 months ago

Just to add my 3 cents on this:

  1. We can have a default value for ELASTICSEARCH_INDEX_NAME_PREFIX in code (similar to how ESHOSTS is a variable that defaults to http://localhost:9200 if not explicitly set)
  2. Different environments can connect to different ELASTICSEARCH_INDEX_NAME_PREFIX (on different ESHOSTS) e.g. test uses mediacloud (not sure if that needs to be updated or not) but I assume in dev, one can connect to locally running ES without the need to connect to the production ES cluster.
  3. Things that go together should be allowed to change (or stay the same) together: ESHOSTS and ELASTICSEARCH_INDEX_NAME_PREFIX form a "connection string" to Elasticsearch. If we allow one to be a variable, can't reason why the other needs to be a constant.
thepsalmist commented 2 months ago

On the above comments,

  1. We define our index pattern explicitly on Elasticsearch's index creation template "index_patterns": "mc_search-*",.This implies all our ES clusters in dev/staging/prod should have the same index patterns. We've previously stated that having a variable sourced from .env or config implies its subject to change. So I'll still to defining this on code.
  2. On trying against real data @philbudne this works on staging https://mcweb-staging.tarbell.mediacloud.org/search?q=Biden%252CTrump&nq=&start=05-22-2024&end=05-30-2024&p=onlinenews-mediacloud&ss=&cs=34412234&any=any&name=Biden%20OR%20Trump&edit=false