jkli1998 / T-CAR

Code for paper 'Zero-Shot Scene Graph Generation via Triplet Calibration and Reduction' (TOMM 2023)
MIT License
6 stars 0 forks source link

ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -6) local_rank: 1 (pid: 2960901) #5

Open Lxy811 opened 8 months ago

Lxy811 commented 8 months ago

Can you check this error for me? I trained with two Gpus. Whether the results of our experiments must be evaluated after training is the final method result. 900

jkli1998 commented 8 months ago

It seems that this bug does not affect the final result of our method and can be ignored.

Lxy811 commented 7 months ago

Hello, when I was training SGDet command, after iterating for a certain number of times, I always reported an error and continued training without sending. Do you know how to solve it? Snipaste_2024-04-09_15-43-02

jkli1998 commented 7 months ago

It looks like there is an error on one of the GPUs, causing the whole task to fail. I didn't try multi-gpu training on this repo. If you have multiple gpus, you could try different tasks on each gpu.

Lxy811 commented 7 months ago

其中一个 GPU 似乎存在错误,导致整个任务失败。我没有在这个存储库上尝试多 gpu 训练。如果您有多个 GPU,则可以在每个 GPU 上尝试不同的任务。

Okay, thank you