Open JamesLYan opened 4 months ago
Indeed, we are migrating pippy into pytorch, see: https://github.com/pytorch/pytorch/tree/main/torch/distributed/pipelining
Does the script work for pp > 2 but without cpu-init?
Indeed, we are migrating pippy into pytorch, see: https://github.com/pytorch/pytorch/tree/main/torch/distributed/pipelining
Does the script work for pp > 2 but without cpu-init?
Unfortunately with the Llama2-7b-hf model , if I set pp>2 without cpu-init, then it would go cuda OOM on all the devices.
I tried downgrading torch to stable 2.3.0 and the same problem occurs. The example script that I am running is /examples/llama/pippy_llama.py
. Since this could be a problem with Pippy v0.2.0
, I will try later with a different Pippy version.
Hi, I am trying to run the example script provided for llama model for inference only. Since the repository is going through migration and a lot of changes, I went back and install the stable
v0.2.0
version. Everything works fine until I started trying to run the example script using cpu-initialization on more than 2 pipeline stages. I am currently running on a server with 8 gpus of Nvidia L4. Forpp = 2
it works perfectly, but as soon as I run the same script with pp more than 2, after the model is initialized, all the other gpus have 0 utilization according tonvidia-smi
output, and the gpu ranked 1 will have 100% util, yet the entire inference process freezes. Has anyone seeing similar issues? Or perhaps there are some quick fix I can try?NVCC and Cuda Verison:
12.1
. torch version:2.4.0.dev20240521+cu118
.