microsoft / scene_graph_benchmark

image scene graph generation benchmark
MIT License
388 stars 86 forks source link

Runtime error occurred in Image Ids: CUDA error: device-side assert triggered #83

Open albertmundu opened 2 years ago

albertmundu commented 2 years ago

There is Runtime error when I am testing the models with batch-size other than 1. You can see the repetitive error below -

Runtime error occurred in Image Ids: 451,456,461,467,473,477,478,479
CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Runtime error occurred in Image Ids: 458,459,460,462,463,464,465,466
CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
  2%|____                                                                                                                                                                                                                                  | 59/3306 [00:08<04:41, 11.53it/s]Runtime error occurred in Image Ids: 468,469,470,471,472,474,475,476
CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Runtime error occurred in Image Ids: 480,482,487,489,490,491,492,494
CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

image

Is there a fix for this?

DavidHuji commented 2 years ago

I had the same issue and I solved it by adding CUDA_VISIBLE_DEVICES=0,1,2,3 at the beginning of the command line (or in Pycharm in the environment variables).

albertmundu commented 2 years ago

@DavidHuji following your solution, I tried to test the model; the error still persists. Is there any modifications to be done to the code.

zhuang-li commented 2 years ago

Hi I found the solution. You set the test batch size to 1 the problem is gone. But I am investigating the code logic to see why this should be like this. There are a lot of errors of this code as I found.

zhuyibing commented 2 years ago

Hi I found the solution. You set the test batch size to 1 the problem is gone. But I am investigating the code logic to see why this should be like this. There are a lot of errors of this code as I found.

Have you find the reason?

zhuang-li commented 2 years ago

I solved the problem long time ago with the same solution. But there are still many errors so I finally gave up reproducing their results. Now I am using other code base.

在 2022年8月20日,上午1:23,zhuyibing @.***> 写道:

 Hi I found the solution. You set the test batch size to 1 the problem is gone. But I am investigating the code logic to see why this should be like this. There are a lot of errors of this code as I found.

Have you find the reason?

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

zhuyibing commented 2 years ago

I solved the problem long time ago with the same solution. But there are still many errors so I finally gave up reproducing their results. Now I am using other code base. 在 2022年8月20日,上午1:23,zhuyibing @.***> 写道:  Hi I found the solution. You set the test batch size to 1 the problem is gone. But I am investigating the code logic to see why this should be like this. There are a lot of errors of this code as I found. Have you find the reason? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

what code base ?can you give me a link

zhuang-li commented 2 years ago

https://github.com/KaihuaTang/Scene-Graph-Benchmark.pytorch

在 2022年8月22日,下午8:08,zhuyibing @.***> 写道:

 I solved the problem long time ago with the same solution. But there are still many errors so I finally gave up reproducing their results. Now I am using other code base. … 在 2022年8月20日,上午1:23,zhuyibing @.***> 写道:  Hi I found the solution. You set the test batch size to 1 the problem is gone. But I am investigating the code logic to see why this should be like this. There are a lot of errors of this code as I found. Have you find the reason? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.

what code base ?can you give me a link

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you commented.