Closed geoffxy closed 4 years ago
I've looked through apex/pyprof
and decided that it is not well suited for our use case. There are a couple of reasons why:
nvprof
(not future-proof for NVIDIA Nsight (i.e. Turing architecture GPUs))pyprof
to extract information we care about (the console based output is not suitable for our needs)pyprof
's "monkey patching" is destructive: there's no way to remove the hooks they add, which is currently problematic for how we run our analysispyprof
doesn't provide run times at the operation levelCompleted as of commit 111661a90a7da909ada66d63a57c11b34d477216.
We need to add in the per-operation run time breakdown again. Previously we used a custom profiling solution. This time we should explore the possibility of using some other profiling tools (e.g., the PyTorch profiler that ships with NVIDIA's apex).
apex/pyprof
)