sigp / lighthouse

Ethereum consensus client in Rust
https://lighthouse.sigmaprime.io/
Apache License 2.0
2.92k stars 743 forks source link

Accessible API for monitoring of validator client #1976

Open KaiRo-at opened 3 years ago

KaiRo-at commented 3 years ago

Description

We are running Lighthouse both as beacon node and validator client, in two separate docker containers. With the beacon API, we could create a python script running in a third contain that does decent monitoring of the beacon state and of on-chain knowledge of our validator(s). What is missing is actually being able to monitor the state of the validator client.

Version

sigp/lighthouse:v1.0.0 image available from docker hub.

Present Behaviour

Currently, two problems exist: First, the validator API is not reachable from the python container as it's exposed only within its own container where nothing else can run. Second, the info that is nicely put in the validator log output is not nicely available in the API from what I can see.

Expected Behaviour

I'd like to see an end point - available to a different docker container, at least with the right command line option - that gives me in machine-readable form what e.g. the most recent log line Nov 25 19:18:13.001 INFO All validators active slot: 52590, epoch: 1643, total_validators: 1, active_validators: 1, proposers: 0, service: notifier tells me as well. Ideally, I also get pubkeys or validator indexes of all validators in there so I can look up more details via the beacon node using that. Also nice would be last attestated slot/epoch for every validator.

Steps to resolve

Sorry, don't know what to put here right now ;-)

paulhauner commented 3 years ago

the most recent log line

We don't really have a connection from logs -> API at the moment, but it's something we could consider into in the medium term. In the meantime, I would suggest obtaining it from syslogd or whatever you're using to collect logs from the VC.

Ideally, I also get pubkeys or validator indexes of all validators in there so I can look up more details via the beacon node using that.

This exists here: https://lighthouse-book.sigmaprime.io/api-vc-endpoints.html#get-lighthousevalidators

Also nice would be last attestated slot/epoch for every validator.

We don't keep this in the VC presently, it's aiming to be as minimal as possible. It should be simple enough though, we could add this after genesis.

Also, this PR will be merged and released before genesis: https://github.com/sigp/lighthouse/pull/1954

paulhauner commented 3 years ago

First, the validator API is not reachable from the python container as it's exposed only within its own container where nothing else can run.

Oh right, your issue is that you can't listen on 0.0.0.0? The idea with that is to stop people from exposing it on the internet. It should be fairly straight-forward to use SSH tunneling to reach it, a proxy or an iptables rule.

We could consider opening this port up to 0.0.0.0 but I'm very hesitant to allow people to do this. If you can't work around it on your end we can consider it.

paulhauner commented 3 years ago

Alright, I'll break this into separate points. Can you please confirm I've got it right.

KaiRo-at commented 3 years ago

Alright, I'll break this into separate points. Can you please confirm I've got it right.

Those sound right from what I can tell, yes.

Oh right, your issue is that you can't listen on 0.0.0.0? The idea with that is to stop people from exposing it on the internet. It should be fairly straight-forward to use SSH tunneling to reach it, a proxy or an iptables rule.

Yes, that seems to be one part of it. From what i see, that's pretty hard with the docker container (I surely wouldn't want to expose to the public, just to another container) as all those things I believe I can't do within the docker container you ship (which rightfully is pretty minimal is what it includes outside of lighthouse itself).

* View logs via VC API.

Well, I'm less interested in the logs themselves than in the data that is in the logs: summary "everything is alright" (is a bonus if it can be determined from other things), slot/epoch, total/active validators, etc.

Main thing is I want to be able to monitor if my validator client is running fine, all validators active and doing their jobs, or if there is anything of concern I need to react to.