Infra to run waku-simulator on latest nwaku master

alrevuelta commented 1 year ago

In order to detect potential issues as soon as possible in nwaku we would need an instance of waku-simulator deployed with the latest nwaku master commit, so that every time we merge a new PR to nwaku, waku-simulator tool is redeployed with that image, so we can monitor if we are introducing any issues (specially related to networking or performance in general).

waku-simulator allows to easily:

Create a network with an arbitrary amount of nwaku nodes (max 250)
Automatically inject gosipsub traffic into the network with some configurable parameters.
Monitor said network with an already provisioned grafana dashboard.

What would we need?

Some infra to run waku-simulator
Redeploy the setup on every new commit to nwaku master. Unsure if this requires changes in nwaku CI, or perhaps it can be auto detected?
A static IP with port :3000 open so that we can visualize the metrics.

Important notes:

The amount of metrics that waku-simulator generates is quite high, this is why we provide our custom grafana/prometheus instance. I would suggest to not "index" these metrics in status infra. To avoid using too much diskspace, prometheus retention time is set to 7 days.
Nodes are running with a simple configuration, where each one should use < 100Mb. A machine with 64Gb should be enough.
Diskspace usage shouldn't be very high, store protocol is not used so the only data that is stored is the prometheus metrics.

TLDR:

We would need some infra to run waku-simulator so that ~~every time we merge a PR to nwaku master, the following is executed.~~ once a day, it deploys the latest nightly nwaku release see

This is the repo

git clone https://github.com/waku-org/waku-simulator.git
cd waku-simulator

And only LATEST_MASTER_PLACEHOLDER should be updated.

export NWAKU_IMAGE=statusteam/nim-waku:LATEST_MASTER_PLACEHOLDER
export NUM_NWAKU_NODES=100
export GOWAKU_IMAGE=statusteam/go-waku:v0.7.0
export NUM_GOWAKU_NODES=0
export MSG_PER_SECOND=10
export MSG_SIZE_KBYTES=10
docker-compose up -d

And then have the already provisioned dashboard available at ip:3000.

cc @jakubgs

alrevuelta commented 1 year ago

Edited: Instead of running waku-simulator on every merge to nwaku master, just run it once a day (see nightly release)

jakubgs commented 1 year ago

I have confirmed that an AX41-NVMe host from Hetzner will suffice for this. https://www.hetzner.com/dedicated-rootserver/matrix-ax

Possibly with extra memory in the future.

jakubgs commented 1 year ago

Looks like Alexis already generalized my script and role for handling GitHub webhooks to update a local repo:

https://github.com/status-im/infra-role-github-webhook

So I can reuse that.

jakubgs commented 1 year ago

I'm refactoring infra-role-github-webhook to handle running a task after repo update. Should finish tomorrow.

jakubgs commented 1 year ago

@alrevuelta was setting export NUM_GOWAKU_NODES=0 intentional, or did you mean 10?

alrevuelta commented 1 year ago

@alrevuelta was setting export NUM_GOWAKU_NODES=0 intentional, or did you mean 10?

yep, 0. by now we will be focusing only on the nwaku<->nwaku integration.

jakubgs commented 1 year ago

Here's a PR to allow auto-updates of Docker images:

https://github.com/waku-org/waku-simulator/pull/8

jakubgs commented 1 year ago

I'm the one working on this.

jakubgs commented 1 year ago

Here's the initial setup:

infra-misc#57001901 - add metal-01.he-eu-hel1.wakusim.misc host
infra-misc#cb5cf671 - waku-simulator: first working version of setup
infra-misc#f215ff63 - waku-simulator: add initial README file
infra-misc#924011c8 - wakusim: add boostrap settings, playbook

What's remaining:

Expose webhook publicly and configure with GitHub
Add OAuth-Proxy for Grafana container
Configure .env symlink to configure containers from repo
Add Consul healthchecks and additional services

jakubgs commented 1 year ago

We're going to use a wakusim.env file in the repo to allow devs to adjust settings on the wakusim.misc host:

https://github.com/waku-org/waku-simulator/pull/9

jakubgs commented 1 year ago

I have configured Grafana dashboard at https://simulator.waku.org/ using OAuth proxy:

https://github.com/status-im/infra-misc/commit/6f561143 - wakusim: add oauth-proxy for Grafana instance

We're not using Grafana built-in OAuth because that would require changes in the waku-simulator repo itself.

jakubgs commented 1 year ago

Also added extra healthchecks:

https://github.com/status-im/infra-misc/commit/209a210f - waku-simulator: consul grafana and prometheus checks
https://github.com/status-im/infra-misc/commit/33469bd0 - waku-simulator: create .env symlink later
https://github.com/status-im/infra-misc/commit/dbb28dc3 - waku-simulator: add healthcheck for compose service

And fixed location of .env symlink task.

jakubgs commented 1 year ago

I have a PR going to migrate from old statusteam org to wakuorg on Docker Hub:

https://github.com/waku-org/nwaku/pull/2077

That is part of a proper setup of automatic builds of master branch that will push a Docker latest tag.

jakubgs commented 1 year ago

I've exposed the /webhook path under the same https://simulator.waku.org/ domain using nginx proxy:

infra-misc#8c71f628 - wakusim: Nginx proxy to combine webhook and Grafana
infra-misc#bb640c74 - waku-simulator: pass mandatory webhook secret

Also added missing webhook secret.

jakubgs commented 1 year ago

Here are all the github-webhook Ansible role changes:

infra-role-github-webhook#9599fefa - user: use github name for user by default
infra-role-github-webhook#eb6db1b6 - user: fix naming of groups additional groups var
infra-role-github-webhook#42af0f6f - service: move description to the template
infra-role-github-webhook#dcfc06f8 - user: use 1500 UID, let fleets override it
infra-role-github-webhook#ffaf74fb - server: support for running command after repo update
infra-role-github-webhook#32f5f2c3 - consul: add missing service ID and port
infra-role-github-webhook#efa1ccd9 - readme: update config examples and explain options
infra-role-github-webhook#802cd09d - server: drop appending repo name to repo path

I had to modify the infra-bi repo to make it compatible:

https://github.com/status-im/infra-bi/pull/58
https://github.com/status-im/infra-bi/pull/57
https://github.com/status-im/airflow-dags/commit/a1035bee1f17f72e7afb4d8d566f89facd2148df - updating the dbt path following webhook update

jakubgs commented 1 year ago

Some more fixes for github-webhook running post update action:

infra-role-github-webhook#4d4f8f29 - service: quote post command in service definition
infra-role-github-webhook#01bed38f - server: fix import of CalledProcessError from subprocess
infra-role-github-webhook#9ca408c4 - server: remove post_action argument from on_push
infra-role-github-webhook#a8cecc02 - server: fix extracting name from repo url

And adjustments to Waku simulator role to allow restarting the compose service:

infra-misc#af34dbc6 - waku-simulator: restart compose service with sudo

And it works:

server.py[248029]: INFO - jakubgs pushed refs/heads/master in waku-org/waku-simulator (8ce2afca-5d64-11ee-956f-7aa842426278)
server.py[248029]: INFO - New commit available: b58f2b0b31b26a571afe7623c24788ecf775cf9f
server.py[248029]: INFO - Updated repo to: b58f2b0b31b26a571afe7623c24788ecf775cf9f
server.py[248029]: INFO - Running post action!
server.py[248029]: INFO - Running command: /usr/bin/sudo systemctl restart waku-simulator-compose
sudo[248149]:  wakusim : PWD=/home/wakusim/webhook ; USER=root ; COMMAND=/bin/systemctl restart waku-simulator-compose
sudo[248149]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1500)
server.py[248029]: INFO - 127.0.0.1 - - [27/Sep/2023 18:35:05] "GET /health HTTP/1.1" 200 -
sudo[248149]: pam_unix(sudo:session): session closed for user root
server.py[248029]: INFO - Command success:
server.py[248029]: b''

jakubgs commented 1 year ago

I consider this done. If there's anything I missed pleas reopen.

jakubgs commented 1 year ago

Forgot to update docs:

https://github.com/waku-org/waku-simulator/pull/10

jakubgs commented 1 year ago

Also had to fix project name which was repo due to checkout folder name:

infra-misc#4f9f3518 - waku-simulator: set COMPOSE_PROJECT_NAMEin service

I was hoping I could use the name parameter:

But it appears to be too fresh for Docker Compose version we have:

The Compose file './docker-compose.yml' is invalid because:
'name' does not match any of the regexes: '^x-'
You might be seeing this error because you're using the wrong Compose file version. Either specify a supported version (e.g "2.2" or "3.3") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.
For more on the Compose file format versions, see https://docs.docker.com/compose/compose-file/

status-im / infra-nim-waku

Infra to run waku-simulator on latest nwaku master #79