Open sergei-mironov opened 4 years ago
Note: printing cdfs
before line 603 shows
[[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 11010048.0, 151617536.0], [0.0, 22544384.0, 303235072.0], [0.0, 45613056.0, 370409472.0], [0.0, 68681728.0, 370409472.0], [0.0, 80740352.0, 420773888.0], [0.0, 81002496.0, 420773888.0], [0.0, 92536832.0, 420773888.0], [0.0, 104071168.0, 420773888.0], [0.0, 116129792.0, 454361088.0], [0.0, 116391936.0, 454361088.0], [0.0, 127926272.0, 454361088.0], [0.0, 139460608.0, 454361088.0], [0.0, 150994944.0, 454361088.0], [0.0, 163053568.0, 487948288.0], [0.0, 163315712.0, 487948288.0], [0.0, 174850048.0, 487948288.0], [0.0, 186384384.0, 487948288.0], [0.0, 209401856.0, 529928192.0], [0.0, 220411904.0, 529928192.0], [0.0, 220674048.0, 529928192.0], [0.0, 220936192.0, 529928192.0], [0.0, 231946240.0, 529928192.0], [0.0, 242956288.0, 529928192.0], [0.0, 253966336.0, 529928192.0], [0.0, 265500672.0, 580292608.0], [0.0, 265762816.0, 580292608.0], [0.0, 276772864.0, 580292608.0], [0.0, 287782912.0, 580292608.0], [0.0, 298792960.0, 580292608.0], [0.0, 310327296.0, 630657024.0], [0.0, 310589440.0, 630657024.0], [0.0, 321599488.0, 630657024.0], [0.0, 332609536.0, 630657024.0], [0.0, 343619584.0, 630657024.0], [0.0, 354629632.0, 630657024.0], [0.0, 366163968.0, 681021440.0], [0.0, 366426112.0, 681021440.0], [0.0, 377436160.0, 681021440.0], [0.0, 388446208.0, 681021440.0], [0.0, 786442240.0, 832787040.0]]
To solve this particular error, you can just comment out the render_bar_graphs_and_cdfs()
method call...I am worried that the root cause is something else though -- seems like the times for each operator are all zeros?
I did some debugging and my guess is that assigns at line 324 are bypassed by the above 'continue' branches. summary_elem
could already contain _time
fields due to zero-initialization at the beginning of the training. Could you please check?
I did some debugging and my guess is that assigns at line 324 are bypassed by the above 'continue' branches.
summary_elem
could already contain_time
fields due to zero-initialization at the beginning of the training. Could you please check?
i got the same problem: times for each operator are all zeros. Did you figure it out?
I'm experiencing the below error which looks critical. I'm using revision f50827f with docker base nvcr.io/nvidia/pytorch:19.05-py3