status-im / infra-nim-waku

Infrastructure for Nim Waku
https://github.com/status-im/nim-waku
4 stars 5 forks source link

Infra to run waku-simulator on latest nwaku master #79

Closed alrevuelta closed 1 year ago

alrevuelta commented 1 year ago

In order to detect potential issues as soon as possible in nwaku we would need an instance of waku-simulator deployed with the latest nwaku master commit, so that every time we merge a new PR to nwaku, waku-simulator tool is redeployed with that image, so we can monitor if we are introducing any issues (specially related to networking or performance in general).

waku-simulator allows to easily:

What would we need?

Important notes:

TLDR:

We would need some infra to run waku-simulator so that every time we merge a PR to nwaku master, the following is executed. once a day, it deploys the latest nightly nwaku release see

This is the repo

git clone https://github.com/waku-org/waku-simulator.git
cd waku-simulator

And only LATEST_MASTER_PLACEHOLDER should be updated.

export NWAKU_IMAGE=statusteam/nim-waku:LATEST_MASTER_PLACEHOLDER
export NUM_NWAKU_NODES=100
export GOWAKU_IMAGE=statusteam/go-waku:v0.7.0
export NUM_GOWAKU_NODES=0
export MSG_PER_SECOND=10
export MSG_SIZE_KBYTES=10
docker-compose up -d

And then have the already provisioned dashboard available at ip:3000.

cc @jakubgs

alrevuelta commented 1 year ago

Edited: Instead of running waku-simulator on every merge to nwaku master, just run it once a day (see nightly release)

jakubgs commented 1 year ago

I have confirmed that an AX41-NVMe host from Hetzner will suffice for this. https://www.hetzner.com/dedicated-rootserver/matrix-ax

Possibly with extra memory in the future.

jakubgs commented 1 year ago

Looks like Alexis already generalized my script and role for handling GitHub webhooks to update a local repo:

So I can reuse that.

jakubgs commented 1 year ago

I'm refactoring infra-role-github-webhook to handle running a task after repo update. Should finish tomorrow.

jakubgs commented 1 year ago

@alrevuelta was setting export NUM_GOWAKU_NODES=0 intentional, or did you mean 10?

alrevuelta commented 1 year ago

@alrevuelta was setting export NUM_GOWAKU_NODES=0 intentional, or did you mean 10?

yep, 0. by now we will be focusing only on the nwaku<->nwaku integration.

jakubgs commented 1 year ago

Here's a PR to allow auto-updates of Docker images:

jakubgs commented 1 year ago

I'm the one working on this.

jakubgs commented 1 year ago

Here's the initial setup:

What's remaining:

jakubgs commented 1 year ago

We're going to use a wakusim.env file in the repo to allow devs to adjust settings on the wakusim.misc host:

jakubgs commented 1 year ago

I have configured Grafana dashboard at https://simulator.waku.org/ using OAuth proxy:

We're not using Grafana built-in OAuth because that would require changes in the waku-simulator repo itself.

jakubgs commented 1 year ago

Also added extra healthchecks:

And fixed location of .env symlink task.

jakubgs commented 1 year ago

I have a PR going to migrate from old statusteam org to wakuorg on Docker Hub:

That is part of a proper setup of automatic builds of master branch that will push a Docker latest tag.

jakubgs commented 1 year ago

I've exposed the /webhook path under the same https://simulator.waku.org/ domain using nginx proxy:

Also added missing webhook secret.

jakubgs commented 1 year ago

Here are all the github-webhook Ansible role changes:

I had to modify the infra-bi repo to make it compatible:

jakubgs commented 1 year ago

Some more fixes for github-webhook running post update action:

And adjustments to Waku simulator role to allow restarting the compose service:

And it works:

server.py[248029]: INFO - jakubgs pushed refs/heads/master in waku-org/waku-simulator (8ce2afca-5d64-11ee-956f-7aa842426278)
server.py[248029]: INFO - New commit available: b58f2b0b31b26a571afe7623c24788ecf775cf9f
server.py[248029]: INFO - Updated repo to: b58f2b0b31b26a571afe7623c24788ecf775cf9f
server.py[248029]: INFO - Running post action!
server.py[248029]: INFO - Running command: /usr/bin/sudo systemctl restart waku-simulator-compose
sudo[248149]:  wakusim : PWD=/home/wakusim/webhook ; USER=root ; COMMAND=/bin/systemctl restart waku-simulator-compose
sudo[248149]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=1500)
server.py[248029]: INFO - 127.0.0.1 - - [27/Sep/2023 18:35:05] "GET /health HTTP/1.1" 200 -
sudo[248149]: pam_unix(sudo:session): session closed for user root
server.py[248029]: INFO - Command success:
server.py[248029]: b''
jakubgs commented 1 year ago

I consider this done. If there's anything I missed pleas reopen.

jakubgs commented 1 year ago

Forgot to update docs:

jakubgs commented 1 year ago

Also had to fix project name which was repo due to checkout folder name:

I was hoping I could use the name parameter:

But it appears to be too fresh for Docker Compose version we have:

The Compose file './docker-compose.yml' is invalid because:
'name' does not match any of the regexes: '^x-'
You might be seeing this error because you're using the wrong Compose file version. Either specify a supported version (e.g "2.2" or "3.3") and place your service definitions under the `services` key, or omit the `version` key and place your service definitions at the root of the file to use version 1.
For more on the Compose file format versions, see https://docs.docker.com/compose/compose-file/