status-im / infra-status-legacy

Infrastructure for old Status fleet
https://github.com/status-im/nim-waku
1 stars 3 forks source link

Add debug nodes on status.prod #35

Closed vitvly closed 11 months ago

vitvly commented 11 months ago

Seems that recent fleet upgrades removed a debug node that could be used to log hashes of all messages passing through the fleet. Is this something we can deploy back?

cc @jm-clius @jakubgs

jm-clius commented 11 months ago

@jakubgs, this was something we set up for a single node in the fleet based on Discord conversation between you, me and @cammellos, so not traceable on Github. Will paste important part of the conversation here:

Requires an image built with the following added to NIMFLAGS

 -d:chronicles_enabled_topics:"waku\ node":TRACE

deployed to a single node in the fleet. This node will log all messages, which is a high logging rate but very useful for debugging.

I also suggested a similar "trace logging" node for the new status.shards* fleets: https://github.com/status-im/infra-shards/issues/2, but not sure if deployed like this (@yakimant)

jakubgs commented 11 months ago

We can certainly build an image with -d:chronicles_enabled_topics:"waku\ node":TRACE added to NIMFLAGS, and use that on one host, but that image would be static, and would have to be updated manually. Is that fine?

jm-clius commented 11 months ago

that image would be static, and would have to be updated manually

For now, I think so, as long as we make this part of our (manual) upgrade deployment steps. Would it be possible to get a dedicated Jenkins job for deploying the debug node image? Then fleet operators can simply trigger this manually when requiring an upgrade of the debug image.

jakubgs commented 11 months ago

Yeah, we could have a job. I wonder how this ties into:

jakubgs commented 11 months ago

It looks like someone already made this change in the deploy-status-prod job:

image

Why do people make changes like this without asking?

jakubgs commented 11 months ago

Undid that changes in the deploy-status-prod job and ran a build for v0.21.1 without that flag: https://ci.infra.status.im/job/nim-waku/job/deploy-status-prod/35/

And created this job to push deploy-status-prod-trace images: https://ci.infra.status.im/job/nim-waku/job/deploy-status-prod-trace/

jm-clius commented 11 months ago

Thanks @jakubgs!

I missed this:

I wonder how this ties into:

Afaics the fix (now merged) will work for this build as well. One thing we'd have to remember though is that the runtime config for this node would need to set the log level to trace (i.e. --log-level=trace in the docker-compose).

jakubgs commented 11 months ago

I've made node-02.do-ams3.status.prod host use the trace image:

https://github.com/status-im/infra-status/blob/ae91342f44ac981e9130d2c468c5dcba18cd8886/ansible/host_vars/node-02.do-ams3.status.prod.yml#L2-L4

admin@node-02.do-ams3.status.prod:~ % docker ps
CONTAINER ID   NAMES      IMAGE                                    CREATED         STATUS
dc47fb8b0990   nim-waku   wakuorg/nwaku:deploy-status-prod-trace   4 minutes ago   Up 4 minutes (healthy)

I consider this done.

jakubgs commented 11 months ago

Added missing log level setting for the host: