ml-energy / zeus

Deep Learning Energy Measurement and Optimization
https://ml.energy/zeus
Apache License 2.0
180 stars 24 forks source link

Abstracting away the GPU #23

Closed jaywonchung closed 3 months ago

jaywonchung commented 9 months ago

Right now, the codebase is full of random calls to pynvml methods. nvmlInit is being called everywhere (leading to duplicate initializations, which is fine but not ideal). NVML device handles are either passed around randomly or re-instantiated every time.

It would be nice to abstract away the GPUs into a single class that exposes the methods that wrap NVML for consistent error handling and logging. For now this is a code quality enhancement, but with #22 coming in, this will be mandatory.