learning-at-home / hivemind

Decentralized deep learning in PyTorch. Built to train models on thousands of volunteers across the world.
MIT License
2.04k stars 167 forks source link

[Feature Request] Network Statistics #520

Open chavinlo opened 1 year ago

chavinlo commented 1 year ago

Is your feature request related to a problem? Please describe. No.

Describe the solution you'd like Basically display 2 stats: 1.- Network Input/Output in mbps 2.- Live count of gradient % sync

Describe alternatives you've considered At the moment I plan on just using an external tracker but it would be nice to have this embedded

Additional context I'm using pytorch lightning as I had an issue with the native version of hivemind (#519 )

borzunov commented 1 year ago

Hi @chavinlo!

Thanks for the feature request! We're busy with another project built on top of hivemind at the moment, but will consider adding this feature when we get back to working on the hivemind core.

Currently, we just use external utilities like nvtop to monitor the traffic (input/output in Mbps). It doesn't display the progress of the gradient sync directly, but one usually can get a sense of it by watching the spikes in traffic (e.g., if previous gradient syncs took 30 sec and the current spike lasts for 15 sec, you know that the sync progress is 50%).

I agree though that an explicit way to watch the progress would be much more convenient.

ikmckenz commented 1 year ago

Would the desire be to monitor at the p2pd layer, or at the Python layer?

libp2p has some metrics built in: https://github.com/libp2p/go-libp2p/tree/master/core/metrics

But it might be easier to just do it inside the Python DHT class?

Bandcompute01 commented 10 months ago

We are building this feature into our platform and can send it to you when it is ready?

mryab commented 10 months ago

Hi @Bandcompute01, sure, that would be awesome! If you are building this on top of hivemind, it would even better to integrate this into the repository with a pull request: if you're interested, I am happy to assist with that