Closed tao123322 closed 2 years ago
Sorry, I did not meet this problem before. But I guess this might be caused by one of the processes in the distributed training crashes or meets some other problems? So, the program will wait for the predictions of the crashed process and just get stuck here.
Thank you very much for your reply. After training and testing, what do the three files eval_results.pytorch, result_dict.pytorch, and visual_info.json in the inference folder mean? How to run them to see the results?
------------------ 原始邮件 ------------------ 发件人: "thunlp/VisualDS" @.>; 发送时间: 2021年10月19日(星期二) 中午11:53 @.>; @.**@.>; 主题: Re: [thunlp/VisualDS] A problem when running sh cmds/20/motif/predcls/sup/train.sh (Issue #2)
Sorry, I did not meet this problem before. But I guess this might be caused by one of the processes in the distributed training crashes or meets some other problems? So, the program will wait for the predictions of the crashed process and just get stuck here.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe. Triage notifications on the go with GitHub Mobile for iOS or Android.
The generated data is the same with the original repo. You can take a try to find details from the original repo. Generally speaking: visual_info.json is gt and predicated bounding boxes. result_dict.pytorch is different evaluation metric value of each image. eval_results.pytorch is gts and predictions saved in BoxList object. You can use boxlist.fields() to view possible attributes like pred_labels for predicted boxlist, and use boxlist.get_field("pred_labels") to get specific attributes.
The program gets stuck while running. The following problems occur, and the GPU has been stuck at 100%
Setting OMP_NUM_THREADS environment variable for each process to be 1 in default, to avoid your system being overloaded, please further tune the variable for optimal performance in your application as needed.