"There appear to be 6 leaked semaphore objects to clean up at shutdown"

Git-oNmE commented 2 years ago

I ran the code sh scripts/train_kitti_det.sh, and I got error message like this:

I looked into it on the google, they said it was because batch_size is too large or num_worker is too large.

I set parameter "batch_size" as 1, and set parameter "gpu" as "0,1,2"(everyone is a 3090 gpu), and set train_feats.py/train_loader/num_workers as 1，but I still got the same message.

Here is my train_kitti_det.sh by the way.

python train_feats.py --batch_size 1 --epochs 100 --lr 0.001 --seed 1 --gpu 0,1,2 \
--npoints 16384 --dataset kitti --voxel_size 0.3 --ckpt_dir /media/data3/hlf_data/HRegNet0/HRegNet/ckpt \
--use_fps --use_weights --data_list ./data/kitti_list --runname "train_kitti_det0" --augment 0.5 \
--root /media/data3/hlf_data/HRegNet0/HRegNet/data/kitti_list --wandb_dir /media/data3/hlf_data/HRegNet0/HRegNet/wandb_env --use_wandb

Is there any way to solve this? :)

FanLu97 commented 2 years ago

We do not support multi GPU training in the current version. A single NVIDIA 3090 is enough for the training. Besides, if you use multi GPU, the batch size should be “integer multiple” of the number of gpus.

Git-oNmE commented 2 years ago

We do not support multi GPU training in the current version. A single NVIDIA 3090 is enough for the training. Besides, if you use multi GPU, the batch size should be “integer multiple” of the number of gpus. I changed my train_kitti_det.sh and set it like this:
python train_feats.py --batch_size 1 --epochs 100 --lr 0.001 --seed 1 --gpu 2 \
--npoints 16384 --dataset kitti --voxel_size 0.3 --ckpt_dir /media/data3/hlf_data/HRegNet0/HRegNet/ckpt \
--use_fps --use_weights --data_list ./data/kitti_list --runname "train_kitti_det0" --augment 0.5 \
--root /media/data3/hlf_data/HRegNet0/HRegNet/data/kitti_list --wandb_dir /media/data3/hlf_data/HRegNet0/HRegNet/wandb_env --use_wandb
GPU 2 is not occupied by the way. And I still got the same error message.

However, I printed a lot of run messages, found that the program stopped when came into this code: line 26.

But I copied the same code and ran it on my computer (only GTX 960M), and I ran the code successfully in my computer.

That's all I can do until now, how can I deal with this problem?

FanLu97 commented 2 years ago

I'm sorry I have no idea about this problem and can not provide any help......

Git-oNmE commented 2 years ago

I have solved this problem. Details in my blog: https://blog.csdn.net/weixin_40286308/article/details/124870766

ispc-lab / HRegNet

"There appear to be 6 leaked semaphore objects to clean up at shutdown" #10