Open mop134679852 opened 7 months ago
Hello author, every time I train, I remain in the following state:
Training with Self Cross Attention Number of params: 41653318 Adding DanceTrack/train Found 40 videos, 41596 frames Sampler_ Steps=None lengths=[5] Found 19370 images
After a while, the following error will appear:
RuntimeError: self. process Group, tensors, buffer Size, authoritative RankSocket Timeout RuntimeError: Socket Timeout Self.process Group, tensors, buffer Size, authoritative Rank
At the same time, I always find that GPU Utils for GPU0 account for 100% when running. Can you help solve it?
Hello author, every time I train, I remain in the following state:
Training with Self Cross Attention Number of params: 41653318 Adding DanceTrack/train Found 40 videos, 41596 frames Sampler_ Steps=None lengths=[5] Found 19370 images
After a while, the following error will appear:
RuntimeError: self. process Group, tensors, buffer Size, authoritative RankSocket Timeout RuntimeError: Socket Timeout Self.process Group, tensors, buffer Size, authoritative Rank
At the same time, I always find that GPU Utils for GPU0 account for 100% when running. Can you help solve it?