neptune-ai / neptune-client

📘 The experiment tracker for foundation model training
https://neptune.ai
Apache License 2.0
580 stars 63 forks source link

Feature Request: GPU power metrics to be logged by default in monitoring namespace. #1853

Closed harishankar-gopalan closed 2 months ago

harishankar-gopalan commented 3 months ago

Hi, I am hereby sharing a diff with the current latest available version in PyPi which is 1.10.4 to automatically log the GPU power usage stats to Neptune whenever a new Run is created. https://github.com/neptune-ai/neptune-client/compare/1.10.4...harishankar-gopalan:neptune-client:v1.10.4_gpu_power?expand=1

I have not been able to easily bump it up to master as I find a lot of files either missing or totally abstracted away probably as separate repositories which I am not able to find. Request to consider adapting these changes as GPU power usage is one of the very important metrics to understand whether we are maxing out the GPU to the fullest over the course of the training run. Most of your competitors like WANDB provide support for this out of the box.

SiddhantSadangi commented 3 months ago

Hey @harishankar-gopalan 👋 Thanks for this feature request and the code diff.

Can you please raise a PR pointing the changes to the dev/1.x branch?

harishankar-gopalan commented 3 months ago

@SiddhantSadangi Have created a PR pointing to dev/1.x branch.

SiddhantSadangi commented 2 months ago

@harishankar-gopalan - This has been released in v1.11.0 🚀