kakaobrain / torchgpipe

A GPipe implementation in PyTorch
https://torchgpipe.readthedocs.io/
BSD 3-Clause "New" or "Revised" License
814 stars 98 forks source link

About the debugger for running torchgpipe #26

Closed Real-ZeminJiang closed 3 years ago

Real-ZeminJiang commented 3 years ago

Hey authors @sublee @ummae @kimdwkimdw @huntrax11 @ildoonet :

I am wondering how to debug the async pytorch in the pipeline.py. There should be some proper debugger that can trace all the code both for CPU and GPU. Thank you very much for the help!

Real-ZeminJiang commented 3 years ago

And when I trying to step into

            output = model(input)

inside main.py,

The program just stuck there and have no response for a long time. Can anyone give me some advice? Thank you very much!

sublee commented 3 years ago

This question does not seem to be related to torchgpipe. However, when I develop torchgpipe, NVIDIA Nsight Systems which is a GPU profiler was very useful to me.

To advise the program in stuck, I need more context. Which main.py did you run? Which GPUs do you use? Was any GPU utilized?

sublee commented 3 years ago

I close this issue because it's not related to torchgpipe and there's no response for a long while.