Elasticsearch secrets & repository configuration

thepsalmist commented 5 months ago

Building off some of the limitations noted on the current elastic-conf service, partly mentioned here

We need to automate setup of Elasticsearch's S3 secrets, access-key and secret-acces-key to enable us take snapshots and upload to S3. The current setup leverages the secrets and repository that were setup when doing the first manual ES snapshot. Elasticsearch recommends adding the secrets to elasticsearch-keystore as per the following commands.

Questions

We currently use elastic-conf as our entry point to ES configurations.Setting the above commands via elastic-conf would mean using subprocess with sudo access. This does not sound like a good idea and therefore elastic-conf would not be the ideal place for this?
We need to cater for ES running on bare metal and docker containers (dev/staging)

Proposal

Add S3 repository creation & validation as part of the elastic-conf
Create a separate script, similar to deploy.sh to set & reload secure settings to Elasticsearch (both bare metal & container setups). Should this be triggered on/from deploy.sh??

philbudne commented 5 months ago

There may be a knot of issues here.

I think the overall goal should be to do all configuration (or as much as possible) from scripts.

I wrote story-indexer/deploy.sh with the goal of being able to run it with ONLY docker group membership, and not presuming that the user has sudo access, so if there are commands or config files that require root, they belong elsewhere: Possibly in the elastic setup/install script (docker/elastic-deploy.sh?) using parameters from the config repo.

Further thoughts/notes:

Since we're running ES outside Docker, it's a counter intuitive that the elastic-deploy.sh script is in the docker directory! This only matters for production stacks, and deploy.sh is not currently amenable for use by others.
Last I looked, the running installs of ES do not match docker/elastic-deploy.sh (no elastic repo file in /etc/apt/sources.list.d)

philbudne commented 5 months ago

Continuing.. *Since staging runs ES under docker, if the keystore and/or repo config cannot be done from elastic-conf.py, that means a staging stack cannot be created from scratch without manual intervention.

kilemensi commented 5 months ago

I think the best way forward is to break down this issue into at least 3 separate tasks:

Managing Elasticsearch secrets: Ideally deploy.sh will be all we need but given that the security settings can only be managed via the elasticsearch-command, we may have to implement this via our Elasticsearch deployment scripts: elastic-deploy.sh for bare-metal/PROD cluster and either bind-mount or custom image (or custom CMD/Entrypoint script) for docker/staging cluster
Snapshot repository: Again, two subtasks here: i. creating an S3 bucket, and ii. registering an S3 bucket as a snapshot repository in Elasticsearch. Should the elastic-conf script do both of these tasks or should the creation of S3 bucket remain outside of this script management?
SLM Policy: If the script can now create/register repositories at will, how does it affect the current implementation of SLM policy management?

Since we have a manual way of doing 1 at the moment, I think we should start by implementing tasks 2 and 3 while we brainstorm on the best way to automate 1.

philbudne commented 5 months ago

Today I found out that elastic-config.py errors out if ELASTICSEARCH_SNAPSHOT_REPO is not provided (as is the case for a developer stack), and the ES index is not created.

My position has been that developer stacks should not require any external storage keys (the archiver leaves local archive files)

I think it would be fine for elastic-config.py to log missing parameters related to snapshots at ERROR priority, but I'm open to discussion...

kilemensi commented 5 months ago

Yeah @philbudne I agree on DEV working without secrets... I had suggested using Filesystem repository for DEV a while back, not sure if you and @thepsalmist have had the chance to look at it or not.

mediacloud / story-indexer

Elasticsearch secrets & repository configuration #297