knaw-huc / loghi

MIT License
103 stars 16 forks source link

Training script tries to run GPU even if none available #18

Closed fattynoparents closed 8 months ago

fattynoparents commented 8 months ago

The na-pipeline-train.sh script tries to run on GPU even when the GPU parameter is -1, which leads to the docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]]. error.

I had to remove the --gpus all parameter to make it running on CPU.

docker run --gpus all --rm -u $(id -u ${USER}):$(id -g ${USER}) -m 32000m --shm-size 10240m -ti \ $BASEMODELDIR \ -v $tmpdir:$tmpdir \ -v $listdir:$listdir \ -v $datadir:$datadir \ loghi/docker.htr:$VERSION python3 /src/loghi-htr/src/main.py \ --do_train \ --train_list $trainlist \ --do_validate \ --validation_list $validationlist \ --learning_rate $learning_rate \ --channels $channels \ --batch_size $batch_size \ --epochs $epochs \ --gpu $GPU \ --height $height \ --use_mask \ --seed 1 \ --beam_width 1 \ --model "$HTRNEWMODEL" \ --multiply $multiply \ --output $listdir \ --model_name $model_name \ --output_charlist $tmpdir/output_charlist.charlist \ --output $tmpdir/output $BASEMODEL