Open tsirif opened 5 years ago
I see that fetching is certainly done properly at the end of the epoch with _lib.utils.convert_to_numpy
function.
Then, using .detach()
instead of .item()
in model plugin implementation is sufficient I think.
Interesting, I wasn't aware of this distinction. Is there a backend solution that can manage this, or is it up to the user when they design routines?
I have implemented it a solution, with a function nested_detach
and apply it to the per routine isolated results. I will make PR soon. So now the user should, either provide just a float, or numpy ndarray, or torch tensor (detached or not).
Using
.item()
to store results inroutine
call forces GPU to synchronize in order to have access at a lazy-evaluated Python number. This is suboptimal as kernel scheduling (CPU load) and kernel execution (GPU load) should be as parallel pipelines as possible, resulting in delays in the opposite case.On the other hand, we need
_all_epoch_results
at the end of an epoch for visualization purposes. As @obilaniu has noted elsewhere, it's better to use.detach()
to store results within a training step, and then let's process results+losses-as-results internally to get the Python/Numpy values, at the moment they are actually needed - that's the end of an epoch.