mediacloud / story-indexer

The core pipeline used to ingest online news stories in the Media Cloud archive.
https://mediacloud.org
Apache License 2.0
2 stars 5 forks source link

Chore/setup es blobstore credentials #314

Open thepsalmist opened 4 months ago

thepsalmist commented 4 months ago

Script to add S3/B2 credentials to Elasticsearch keystore. Needed for snapshot & restore to AWS S3/Bcackblaze

thepsalmist commented 4 months ago

Note: We need to configure credentials on all ES nodes. All our deploy scripts (benchmarking) from deploy.sh assumes deployment is on the logged in host.

Thinking if/how to handle multiple hosts & if this should leave on code repo??

kilemensi commented 4 months ago

🤔 ... I think lets first clearly document all the steps we currently have to do to add credentials in all ES nodes.

  1. Do we need to create the keystore or is it created as part of installation?,
  2. Is it password protected or not (is there a need)?,
  3. Are the keystore contents backed-up as part of our backup (should they be?), etc.
  4. ???

Once this is done, then we'll be in a better position to answer your question:

Thinking if/how to handle multiple hosts & if this should leave on code repo??

thepsalmist commented 4 months ago

🤔 ... I think lets first clearly document all the steps we currently have to do to add credentials in all ES nodes.

  1. Do we need to create the keystore or is it created as part of installation?,
  2. Is it password protected or not (is there a need)?,
  3. Are the keystore contents backed-up as part of our backup (should they be?), etc.
  4. ???

Once this is done, then we'll be in a better position to answer your question:

Thinking if/how to handle multiple hosts & if this should leave on code repo??

  1. For ES 8.x keystore is created automatically, since the default behaviour is to start with security auto-configurations on installation
  2. Currently not password protected (since we had decided to disable all x-pack security features - all access within angwin cluster), there's an option to create with password
  3. The keystore is not part of the backup done from the Snapshot Lifecycle Manager. Backing up keystore is considered best practice
philbudne commented 4 months ago

Noticed this PR and wanted to share some musings/questions (without any specific answers):

I looked at it and thought "what does that have to do with blobstore.py?" Is there some ES-specific term (like repository or keystore) that could be used instead?

I see it creates a new top level directory for elasticsearch; we've tried to avoid top level clutter, and we already have docker/elastic-deploy.sh, which isn't ideal. I originally proposed that we run RabbitMQ outside docker, and in that case I wouldn't want a rabbitmq top level directory as well as an ES directory. One way to avoid cognitive dissonance would be to silently think of the docker directory as the deploy directory. I guess another would be to rename the directory!! It's a topic worthy of group discussion.

Could the installation of the keys be done in elastic-deploy.sh??? I realize it probably requires waiting for the ES cluster to come up...

As a P.S. Some time ago, I remember looking and seeing that I thought elastic-deploy.sh added a file to sources.list.d for the ES repo, but that was NOT the state of the currently running servers!

When possible, I think install scripts should be re-runable/idempotent: testing whether a step has been taken or not before taking the step, so that it's a no-op to re-run the script...

thepsalmist commented 4 months ago

Noticed this PR and wanted to share some musings/questions (without any specific answers):

I looked at it and thought "what does that have to do with blobstore.py?" Is there some ES-specific term (like repository or keystore) that could be used instead?

I see it creates a new top level directory for elasticsearch; we've tried to avoid top level clutter, and we already have docker/elastic-deploy.sh, which isn't ideal. I originally proposed that we run RabbitMQ outside docker, and in that case I wouldn't want a rabbitmq top level directory as well as an ES directory. One way to avoid cognitive dissonance would be to silently think of the docker directory as the deploy directory. I guess another would be to rename the directory!! It's a topic worthy of group discussion.

Could the installation of the keys be done in elastic-deploy.sh??? I realize it probably requires waiting for the ES cluster to come up...

As a P.S. Some time ago, I remember looking and seeing that I thought elastic-deploy.sh added a file to sources.list.d for the ES repo, but that was NOT the state of the currently running servers!

When possible, I think install scripts should be re-runable/idempotent: testing whether a step has been taken or not before taking the step, so that it's a no-op to re-run the script...

Good suggestions from the above,

  1. the suggestion of an elasticsearch TLD was from an initial conversation on elastic-deploy.sh not being ideal with the docker/ directory. I'm also for the discussion on renaming the docker directory to something along deploy scripts.

  2. Move the script as part of elastic_deploy answers the other questions on naming et.al