xiuqhou / Salience-DETR

[CVPR 2024] Official implementation of the paper "Salience DETR: Enhancing Detection Transformer with Hierarchical Salience Filtering Refinement"
https://arxiv.org/abs/2403.16131
Apache License 2.0
105 stars 7 forks source link

程序卡住---不报错、不进行训练 #1

Open siyu-chen-cloud opened 4 months ago

siyu-chen-cloud commented 4 months ago

程序运行几个batch会卡住,显存被占用,但是不进行计算 环境log如下:


sys.platform linux Python 3.8.18 packaged by conda-forge (default, Dec 23 2023, 17:21:28) [GCC 12.3.0] numpy 1.24.4 PyTorch 1.12.1+cu113 @/home/ubuntu22/anaconda3/envs/sl/lib/python3.8/site-packages/torch PyTorch debug build False torch._C._GLIBCXX_USE_CXX11_ABI False GPU available Yes GPU 0 NVIDIA GeForce RTX 3090 (arch=8.6) Driver version 546.17 CUDA_HOME /usr/local/cuda-11.3 Pillow 10.3.0 torchvision 0.13.1+cu113 @/home/ubuntu22/anaconda3/envs/sl/lib/python3.8/site-packages/torchvision torchvision arch flags 3.5, 5.0, 6.0, 7.0, 7.5, 8.0, 8.6 fvcore 0.1.5.post20221221 iopath 0.1.9 cv2 4.9.0

PyTorch built with:

很多说是互锁或者内存溢出了,但是num_works=1, batch_size=2仍然会出现卡住的情况,请问有什么办法解决?

xiuqhou commented 4 months ago

您好,请问您是使用COCO数据集还是自定义数据集训练的模型,能否提供一下完整的训练日志以及强行中断程序后的报错信息,以方便我定位问题,谢谢。