Open HongChow opened 5 years ago
if i set CUDA_VISIBLE_DEVICES=0 , the cuda memory error happened, and i have only one GPU ...
and if i set worker 1 another error information : RuntimeError: DataLoader worker (pid 11009) is killed by signal: Killed. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.
No more error tips for me making me confused.
I am trying to finetune my own dataset(zm) to by the pretrained resenet model, but it returned some error information as following:
Starting Epoch: 0 Total Epoches: 50 0%| | 0/2 [00:00<?, ?it/s] =>Epoches 0, learning rate = 0.0070, previous best = 0.0000 ./train_zm.sh: line 1: 7069 Killed CUDA_VISIBLE_DEVICES=1 python train.py --backbone resnet --lr 0.007 --workers 0 --epochs 50 --batch-size 16 --gpu-ids 1 --checkname deeplab-resnet --eval-interval 1 --dataset zm