mediacloud / story-indexer

The core pipeline used to ingest online news stories in the Media Cloud archive.
https://mediacloud.org
Apache License 2.0
2 stars 5 forks source link

remove news-search-api from story-indexer so we can independently manage deployment #268

Closed rahulbot closed 5 months ago

rahulbot commented 8 months ago

The goal is to pull our news-search-api so that we can manage deployment of it independently from the overall story-index.

Related https://github.com/mediacloud/news-search-api/issues/27, but tracking on here so we can see it in the same place as other tasks.

kilemensi commented 8 months ago
  1. Can use GitHub secrets for any sensitive information.
  2. Need to figure out how to deploy/push image to the (staging?) server (Similar to https://github.com/marketplace/actions/dokku).
  3. ???
thepsalmist commented 8 months ago

Draft PR here, pending clarifications below:

  1. Building of from the comments above, using Github secrets should allow us to store some of the sensitive info that will require like, ssh_username and server_ip/hostname for the action to deploy the docker-compose file on the Angwin cluster.
  2. From the workflows we can tag the images for each environment, staging or prod and we pass in the ESHOSTS as an env variable. This should allow specifying a staging/prod Elasticsearch connection (should be different urls host.9200, and host.9210)
  3. @rahulbot one clarification; since we require VPN access to ssh into the angwin cluster, can we inquire if we can make a provision for deployments via GH Actions. There is no direct mention of VPN support for Github hosted runners, so that may mean we need some kind of customizations around this.
philbudne commented 8 months ago

since we require VPN access to ssh into the angwin cluster, can we inquire if we can make a provision for deployments via GH Actions. There is no direct mention of VPN support for Github hosted runners, so that may mean we need some kind of customizations around this.

Considering that the major secret we are trying to protect is the sentry DSN, (at worst, an annoyance if stolen and spammed?) that doesn't seem like a great trade-off against credentials that could compromise the campus network and our cluster!

It's possible the CS support folks might have a suggestion on if/how this has been dealt with by other projects...

rahulbot commented 8 months ago

Short term deployment plan for now is to use a private repo with a Docker(?) config file, like we do for story-indexer.

philbudne commented 8 months ago

Background:

story-indexer uses a shell script (deploy.sh) that generates a JSON file with parameters (in the script and from private config files) based on the currently checked out branch (production, staging, other), and generates a docker stack name and tag based on the branch.

The JSON file is passed to (command line) jinja2 to process docker-compose.yml.j2 template, to create docker-compose.yml which is then used to build, tag and push an image, and then "compose" the stack, and apply a newly generated tag to the image, the source repo and the config repo for BOTH staging and production deployments. The template file means that there is only one compose file to maintain.

I don't THINK there is any way to substitute values into a docker-compose.yml file. There IS an idea of secrets in the docker compose universe, but I have no knowledge of the pain/benefit equation. My preference is to have configuration under revision control, so we can examine past changes, and revert to known good image/configuration combinations.

The news-search-api case is different in that the image is being built on github (whenever a tag is applied?) and available from an image registry. For something like the indexer deploy script, I might have the new script take a previosly applied image/source tag name as input. If the tag ends in "bNNN", generate a staging stack (stack name and config). If the tag name already exists in the config, check it out and use it, if not, apply the tag at the head of config repo???

thepsalmist commented 6 months ago

News-search API deployemnt Resolved in https://github.com/mediacloud/news-search-api/pull/66