Closed tohtana closed 3 months ago
instrument_w_nvtx breaks a graph as range_push and range_pop return a non-tensor int. This PR disables the decorator to avoid the break graph.
instrument_w_nvtx
range_push
range_pop
This actually impacts the performance. In my environment, the training iteration time using Llama-3-8B/4GPUs/ZeRO1 is improved from 3.02s -> 2.54s.
instrument_w_nvtx
breaks a graph asrange_push
andrange_pop
return a non-tensor int. This PR disables the decorator to avoid the break graph.This actually impacts the performance. In my environment, the training iteration time using Llama-3-8B/4GPUs/ZeRO1 is improved from 3.02s -> 2.54s.