Closed vibhatha closed 4 years ago
I don't understand what do you need. Could you explain it in more detail?
@sublee What I mean is, there are scripts to benchmark TorchGpipe tool. I ran them and I get similar results as you say. I was wandering whether I can check it with your Gpipe original results.
I couldn't find that script to replicate or approximately get Huang et al results and compare it with TorchGpipe results?
Is this clear?
Thanks for explaining. The original implementation can be found in Lingvo but the benchmark scripts seem never to be published. We didn't create our own benchmark scripts for the original results. Instead, we just copied the numbers from the paper.
So, did you use the same hardware that they used?
We have test torchgpipe on NVIDIA Tesla P40 or V100 GPUs while P100 was used in the original experiment. For TPUs, torchgpipe does not support them.
I understand.
One more thing to clarify. For the backward pass, in the torchgpipe implementation,
Do you try to do the same thing done in the Google paper? Does microbatches used in the backward pass as well? Or done for the whole mini-batch?
Micro-batches work on both forward and backward pass. If you profile with NVIDIA Nsight Systems, you will see the typical pipeline parallelism timeline in both ways.
Did you try to replicate the timeline? I see unet-timeline benchmark folder? Was that tried there?
I understood you're asking whether the timeline benchmark can be found in the original paper or not. Did I understand correctly? If so, we created the timeline benchmark on U-Net to show our efficiency on a model including skip connections. It was not derived from the original paper.
That is nice. I added some micro-benchmarks on the performance. I can create a PR once I make the code neat. I would like to contribute that to the code base if it is possible and the code is suitable.
Thanks a lot for your support.
Contributions are welcomed. Please see our contributing guide to refine your work to be suitable for torchgpipe. We would discuss new benchmarks on PR in the future. I think that it's time to close this issue.
I want to compare gpipe benchmark to torchgpipe benchmark. I ran some micro benchmarks. I want to test same overheads in Gpipe. Is that script opensource?