megvii-research / MOTRv2

[CVPR2023] MOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors
Other
343 stars 44 forks source link

Regarding the issue of socket timeout. #56

Open mop134679852 opened 7 months ago

mop134679852 commented 7 months ago

Hello author, every time I train, I remain in the following state:

Training with Self Cross Attention Number of params: 41653318 Adding DanceTrack/train Found 40 videos, 41596 frames Sampler_ Steps=None lengths=[5] Found 19370 images

After a while, the following error will appear:

RuntimeError: self. process Group, tensors, buffer Size, authoritative RankSocket Timeout RuntimeError: Socket Timeout Self.process Group, tensors, buffer Size, authoritative Rank

At the same time, I always find that GPU Utils for GPU0 account for 100% when running. Can you help solve it?