Abstracting away the GPU

Right now, the codebase is full of random calls to pynvml methods. nvmlInit is being called everywhere (leading to duplicate initializations, which is fine but not ideal). NVML device handles are either passed around randomly or re-instantiated every time.

It would be nice to abstract away the GPUs into a single class that exposes the methods that wrap NVML for consistent error handling and logging. For now this is a code quality enhancement, but with #22 coming in, this will be mandatory.

ml-energy / zeus

Abstracting away the GPU #23