Closed jkuester closed 1 month ago
in our compose files we pin to a version.... However, that's actually the default value for when the environment variable GRAFANA_VERSION isn't set:
Correct. If you copy-paste the .env.example
file and do not edit/comment-out the *_VERSION
variables, you will end up getting the latest
images. I agree this this is a bit counter-intuitive. (And can easily result in undesirable behavior in that we don't want folks to accidentally run on latest
if they don't want to.) At the same time, I don't want to set the pinned version in both the compose file and the env file (since that is extra places to update and more complexity). My proposal is that we comment-out the *_VERSION
variables in the .env.example
file. They are still in there so it is clear they can be customized, but folks will not "accidentally" be getting latest
.
why are we introducing a test matrix of node 20 and 22
IMHO, it is a best practice to confirm your Node project builds on each of the supported LTS versions. The impetus behind making the change now was that semantic-release
and conventional-commits
dropped support for Node 18. So, I had to update the Node version used by the release.yml
and conventional-commits.yml
. Because of that, I also wanted to use the same Node version in the integration-tests.yml
so that if there was a Node issue running npm ci
, it would happen in integration-tests.yml
before we ever get to release.yml
.
I could have just set everything to build with either Node 20 or 22. The downside of using just 20 is that we will have to update the workflow config again sooner (when 20 is EOL) and developers might run into unknown issues if they try building the project with Node 22. The downside of using just 22 is the opposite. Developers using 20 might run into unknown issues since we only test with 22. So, I decided the best balance was to uplift the release/conventional-commits stuff to use 22 and then run our tests with both 20 and 22 to make sure everything gets covered....
Awesome! Agreed on your plan to comment out the variables in the env.example
file - good thinking!
Also - thanks for explaining the logic of the node test matrix - makes perfect sense.
Finally, I think we should dogfood this change. I propose:
main
I think it's a bit of overkill, but I also think there's no rush to merge and we can slow roll the PR for a week or two as needed while we test and watch the burn in on prod.
I like you plan of dogfooding this change! :+1: Since it is a major version bump for Grafana, I think it probably deserves a closer look than normal.
But, also like you said, no rush on any of this!
OK on prod watchdog we have:
2.51.1
(that version matches hash below from the "latest
" image from 5 mo agao)10.4.1
(d94d597d847c05085542c29dfad6b3f469cc77e1) - from login screen v0.6.0
- this version is unchanged when pulling latest
todayI'm going to test setting these up locally, throw some data in there and then cut over to the 112_upgrade_services
branch and see what happens :crossed_fingers: To first bootstrap on the older versions I'll have to (ironically) pin them to these versions, otherwise, you know, they'll go to latest :laughing:
root@watchdog:~# hostname
watchdog.app.medicmobile.org
root@watchdog:~# docker image ls|grep prom/prometheus
prom/prometheus latest e350b167c4fa 5 months ago 262MB
prom/prometheus <none> 1d3b7f56885b 5 months ago 262MB
prom/prometheus <none> 75972a31ad25 16 months ago 234MB
root@watchdog:~# docker image inspect prom/prometheus| jq ".[0].RepoDigests"
[
"prom/prometheus@sha256:dec2018ae55885fed717f25c289b8c9cff0bf5fbb9e619fb49b6161ac493c016"
]
local dev upgrade went super good! I set my .env
file to below, stood up an instance and let it gather data from gamma and moh mali for a good 20 min. then i docker compose down
, then checked out this branch and commented out my 3 versions in .env
and did a docker compose pull
after checking images in docker image ls
looked good - i did a docker compose up -d
. Everything more or less instantly upgraded!
I've verified that prod watchdog is backed up in EC2 snapshots, so I'll do the prod upgrade next Tue when I'm back from being out on Monday!
starting .env
file
grep = .env
GRAFANA_ADMIN_USER=medic
GRAFANA_ADMIN_PASSWORD=password
GRAFANA_VERSION=10.4.1
GRAFANA_PORT=3000
GRAFANA_BIND=127.0.0.1
GRAFANA_DATA="./grafana/data"
GRAFANA_PLUGINS=grafana-discourse-datasource
JSON_EXPORTER_VERSION=latest
PROMETHEUS_VERSION=v2.51.1
PROMETHEUS_DATA="./prometheus/data"
PROMETHEUS_RETENTION_TIME=60d
SQL_EXPORTER_IP=127.0.0.1
SQL_EXPORTER_PORT=9399
PROMETHEUS_BIND=127.0.0.1
PROMETHEUS_PORT=9090
production is updated:
112_upgrade_services
cd ~/cht-monitoring
docker compose \
-f docker-compose.yml \
-f exporters/postgres/compose.yml \
-f ../caddy-compose.yml \
-f ../docker-compose-cht3x.yml \
-f data-ingest/extra-sql-compose.yml \
-f node-exporter/compose.yml \
down
docker compose \
-f docker-compose.yml \
-f exporters/postgres/compose.yml \
-f ../caddy-compose.yml \
-f ../docker-compose-cht3x.yml \
-f data-ingest/extra-sql-compose.yml \
-f node-exporter/compose.yml \
pull
docker compose \
-f docker-compose.yml \
-f exporters/postgres/compose.yml \
-f ../caddy-compose.yml \
-f ../docker-compose-cht3x.yml \
-f data-ingest/extra-sql-compose.yml \
-f node-exporter/compose.yml \
pull
curl 172.21.0.3:9090/api/v1/status/buildinfo|jq
{
"status": "success",
"data": {
"version": "2.54.1",
"revision": "e6cfa720fbe6280153fab13090a483dbd40bece3",
"branch": "HEAD",
"buildUser": "root@812ffd741951",
"buildDate": "20240827-10:56:41",
"goVersion": "go1.22.6"
}
}
and at login grafana shows: 11.2.0 (2a88694fd3)
Over to @jkuester to finish up this PR
@jkuester - when you merge this to main, please go un-comment cronjob on watchdog so it starts pulling again automatically:
root@watchdog:~/cht-monitoring# crontab -l
# check for new cht watchdog version, upgrade if new version & announce in slack
#*/5 * * * * /root/continious-deployment.sh
@mrjones-plip I need you to hit the Approve
button (from the "Files changed" tab) before I can actually merge this! :sweat_smile:
Sorry! shoulda remembered that.
:tada: This PR is included in version 1.15.0 :tada:
The release is available on GitHub release
Your semantic-release bot :package::rocket:
Update the default versions for Prometheus and Grafana:
v2.46.0
>v2.54.1
10.0.3
>11.2.0
There was no major breaking changes to note for the Prometheus upgrade. For Grafana, I reviewed the release notes and upgrade guides. While many things did change, I did not find anything that required a manual migration when updating a Watchdog instance. All of our default configuration seems compatible with the new version of Grafana.
Add
SQL_EXPORTER_VERSION
envarFollowing the example of the other docker images, I am pinning the version of
burningalchemist/sql_exporter
(to its latest current release) and I have added theSQL_EXPORTER_VERSION
envar to the.env.example
file as the place where users can configure a custom version of the sql exporter.Bump node dev dependencies in package.json
I was able to lift the dependencies to their latest versions except for:
chai
- the new major version requires ESM modules.eslint
- requires config migration (which does not play well with our base@medic
config).The new version of the
conventional-commits
libraries is no longer compatible with Node18
. So, I set our minimum Node engine config to match the required version of Node20
. I also updated our GitHub Action workflow configs to run with the new Node version.The new major version of
husky
included some minor migration steps with our husky config detailed here. I did validate that our git pre-commit hook still works for me locally.Dependabot
I have added dependabot config (as Andra suggested in the issue). It is pretty straightforward, but @mrjones-plip I think you will still need to use your admin powers to actually enable the bot in the repo settings.... (Unless it has the necessary authorization at the org level... :thinking: )