[ ] Do you need to use 'python -u' to ensure memory stats are printed at the right time? The first few iterations tend to have weird memory usages for gradients, maybe it is GPU/CPU synchronization issue.
[x] Add more dots so that resnet info can be printed properly
[x] Intermediate weights : make the table formatting in line with latex doc's recommendation table (activation -> feature map, rename total weight and total gradients, change indent on intermediate grads)
[x] Make csv formatting in line with above changes
[x] README updates: use better table example, and explain why cache size might differ (monotically increases in chunks)
[x] make profiler global so it can be referenced from all functions
[x] change README so that instructions say to initialize profiler right before training loop, ensuring that loss, criterion, etc. are registered with model
[ ] Do you need to use 'python -u' to ensure memory stats are printed at the right time? The first few iterations tend to have weird memory usages for gradients, maybe it is GPU/CPU synchronization issue.
[x] Add more dots so that resnet info can be printed properly
[x] Intermediate weights : make the table formatting in line with latex doc's recommendation table (activation -> feature map, rename total weight and total gradients, change indent on intermediate grads)
[x] Make csv formatting in line with above changes
[x] README updates: use better table example, and explain why cache size might differ (monotically increases in chunks)
[x] make profiler global so it can be referenced from all functions
[x] change README so that instructions say to initialize profiler right before training loop, ensuring that loss, criterion, etc. are registered with model
[x] Remove total allocated from diagnostics
[x] Make private functions start with underscore