jfzhang95 / pytorch-deeplab-xception

DeepLab v3+ model in PyTorch. Support different backbones.
MIT License
2.89k stars 779 forks source link

DeepLab seems crashed? #118

Open HongChow opened 5 years ago

HongChow commented 5 years ago

I am trying to finetune my own dataset(zm) to by the pretrained resenet model, but it returned some error information as following:

Starting Epoch: 0 Total Epoches: 50 0%| | 0/2 [00:00<?, ?it/s] =>Epoches 0, learning rate = 0.0070, previous best = 0.0000 ./train_zm.sh: line 1: 7069 Killed CUDA_VISIBLE_DEVICES=1 python train.py --backbone resnet --lr 0.007 --workers 0 --epochs 50 --batch-size 16 --gpu-ids 1 --checkname deeplab-resnet --eval-interval 1 --dataset zm

HongChow commented 5 years ago

if i set CUDA_VISIBLE_DEVICES=0 , the cuda memory error happened, and i have only one GPU ...

HongChow commented 5 years ago

and if i set worker 1 another error information : RuntimeError: DataLoader worker (pid 11009) is killed by signal: Killed. Details are lost due to multiprocessing. Rerunning with num_workers=0 may give better error trace.

HongChow commented 5 years ago

No more error tips for me making me confused.