I'm trying to start a pytorch training using volcano and pytorch plugin. I have 2 nodes, each with 8 gpus.
I found that volcano sets WORLD_SIZE = 2, RANK = 0 (first pod), 1 (second pod) but I couldn't find the LOCAL_RANK in the env vars so I can target each GPU.
My question is, is it possible to use multiple gpus in each pod or it's just one gpu per pod. If it's possible, what am I missing in my configurations?
Please describe your problem in detail
I'm trying to start a pytorch training using volcano and pytorch plugin. I have 2 nodes, each with 8 gpus. I found that volcano sets WORLD_SIZE = 2, RANK = 0 (first pod), 1 (second pod) but I couldn't find the LOCAL_RANK in the env vars so I can target each GPU. My question is, is it possible to use multiple gpus in each pod or it's just one gpu per pod. If it's possible, what am I missing in my configurations?
This is the tasks part in my manifest:
Any other relevant information
No response