Closed YoushaaMurhij closed 1 year ago
Hi @YoushaaMurhij,
Thanks for your interest but sorry I am not an expert in slurm. Let me also involve @zhijian-liu in the discussion to see whether he has some idea for that.
Best, Haotian
I could not solve this problem so I used torch.distributed.launch instead of torchpack!
I could not solve this problem so I used torch.distributed.launch instead of torchpack!
Hi @YoushaaMurhij, would you please share the detailed steps for changing from torchpack to torch.distributed.launch? Thanks!
Hi @YoushaaMurhij, i met the same question with you, Would you please share the detailed steps for changing from torchpack to torch.distributed.launch? Thanks!
Hi! did you guys managed to change from torchpack to torch.distributed.launch?
Hi, I am trying to train the BEVFusion detection model on a Slurm based cluster and got this error when choosing np > 1:
Do you have any idea, what could cause such a problem? I tried to use
torchrun
andtorch.distributed.launch
without luck! PS: I am sure that there are 6 allocated gpus for that task.Thanks!