Open Jasha10 opened 10 months ago
I can reliably reproduce this when calling torch.cuda.memory_summary()
with a device int or str (e.g. cuda:0
). The problem is, that in that case _lazy_init()
is never called. Some other functions like torch.cuda.reset_peak_memory_stats('cuda:0')
are affected by this too.
This issue is related to https://github.com/pytorch/pytorch/issues/49952. Unfortunately PR https://github.com/pytorch/pytorch/pull/51179 didn't fix it in the memory_summary
function.
Seems like https://github.com/pytorch/pytorch/pull/117143 was never merged and closed as stale.
🐛 Describe the bug
Calling
torch.cuda.memory_summary()
can give aKeyError
under certain circumstances.The root cause here is that
torch.cuda.memory_stats()
(which is used internally bytorch.cuda.memory_summary()
) has returned an object that does not have the expected keys.Versions
cc @ptrblck