rajveerb / lotus

Lotus: Characterization of Machine Learning Preprocessing Pipelines via Framework and Hardware Profiling
Other
3 stars 1 forks source link

Async copy of tensor from GPU to CPU #23

Closed rajveerb closed 11 months ago

rajveerb commented 11 months ago

Currently, the logic to calculate and display loss is done synchronously which is implicit due to loss.item(). This copies the loss tensor from GPU to CPU in sync which is inefficient.

The solution is to asynchronously copy loss tensor in a non-blocking approach by using tensor.to with copy and non_blocking set to True.