torch.distributed.launch

taichi-dev / taichi

Productive, portable, and performant GPU programming in Python.

https://taichi-lang.org

Apache License 2.0

25.5k stars 2.28k forks source link

torch.distributed.launch #8383

Open lidc54 opened 1 year ago

lidc54 commented 1 year ago

using the code from blog_code being a layer in pytorch. And then I multi-gpu to see the result. But error occurs. python -m torch.distributed.launch --nproc_per_node=2 main.py 1698153611088 we can see that RAM in gpu-6 is twice the one in gpu-7. when image become bigger or batch increase, it will be a big problems. I suppose ti.init may be the source of the problem. However, I can not find and solution in relative issue. does anyone know about it?

bobcao3 commented 1 year ago

Taichi can't use multiple GPUs at this moment. To use multi GPU you need to run taichi in different processes, so it wouldn't play well with torch's multi gpu solution

KazukiYoshiyama-sony commented 8 months ago

@turbo0628

It would be really really appreciated that taichi support mutiple gpu.

I sometimes write custom kernels naively, including makefile, cmake, and/or pytorch extension. Taich could disposes away cumbersome binding codes, which would focus us on algorithm and leads to minimal amount of debugging time.

I, reacently, first made a taichi kernel called in multiple processes spawed in pytorch lightning. However, I could not fix the illegal memory access error, which would seems caused by ti.init in multi process environment, so I moved back to the classical old process, which could take 2-3x more time than using taichi.

keunhong commented 5 months ago

I have been running into the same issue. Is there any way around this? In theory it seems like Taichi should be able to bind to the correct GPU and run on that, but there seems to be some hardcoded logic making it bind to the first GPU which results in it causing an illegal memory access error. With torch's DistributedDataParallel it would be fine as long as Taichi's context could be bound to the correct GPU given by the local rank. This not working currently precludes the use of Taichi within the implementation of any large models that require multi-GPU training.

keunhong commented 5 months ago

@turbo0628

Are there any workarounds that would allow us to force Taichi onto a specific GPU index for each process? For example something that would allow us to set the GPU index in ti.init(ti.cuda)