Open gingerhead22 opened 7 years ago
I'm having the same problem, have you found a solution?
You can try to reset the 'gpus' number in the sh file pertaining to the dataset you're training on. This file is in the 'bash' folder.
Thank you for your suggestion. Yep, I reset my gpus as 0 and problem solved. However, there are more problem with pickle() command happened then.
@gingerhead22 Hi,I want to ask, if I just have one GPU GTX 1070 , 'gpus' number is '0' or '1'? I don't what is my GPU number. I didn't find the explain of 'gpus number'.
@ShaoNeilz Hey, u should set it as 0, since the ordinal starts from 0
@gingerhead22 thanks for your answer!
May I ask one another question ?
When I run 'train_flic.sh' , I haven't gotten any extention about the trainning, and system will be halted.
here is my terminal's context. This makes me really confused.
OS: ubuntu 14.04 amd_64;python 2.7.x;
shaolizhi@standwisdom:/machinelearning/deeppose-master$ bash shells/train_flic.sh 2017-03-07 12:49:50,150 [INFO] sys.version_info(major=2, minor=7, micro=12, releaselevel='final', serial=0) 2017-03-07 12:49:50,151 [INFO] chainer version: 1.21.0 2017-03-07 12:49:50,151 [INFO] cuda: True, cudnn: False 2017-03-07 12:49:50,151 [INFO] Namespace(adam_alpha=0.001, adam_beta1=0.9, adam_beta2=0.999, adam_eps=1e-08, base_zoom=1.5, batchsize=128, channel=3, coord_normalize=True, epoch=101, fliplr=True, fname_index=0, gcn=True, gpus='1', ignore_label=-1, im_size=220, img_dir='data/FLIC-full/images', joint_index=1, lr=0.01, lr_decay_freq=10, lr_decay_ratio=0.1, min_dim=0, model='models/AlexNet.py', n_joints=7, opt='Adam', resume_model=None, resume_opt=None, resume_param=None, rotate=True, rotate_range=10, seed=1701, show_log_iter=10, snapshot=10, symmetric_joints='[[2, 4], [1, 5], [0, 6]]', test_csv_fn='data/FLIC-full/test_joints.csv', test_freq=10, train_csv_fn='data/FLIC-full/train_joints.csv', translate=True, translate_range=5, valid_freq=5, weight_decay=0.0005, zoom=True, zoom_range=0.2) shells/train_flic.sh: 行 31: 23280 已杀死 CHAINER_TYPE_CHECK=0 python scripts/train.py --model models/AlexNet.py --gpus 1 --epoch 100 --batchsize 128 --snapshot 10 --valid_freq 5 --train_csv_fn data/FLIC-full/train_joints.csv --test_csv_fn data/FLIC-full/test_joints.csv --img_dir data/FLIC-full/images --test_freq 10 --seed 1701 --im_size 220 --fliplr --rotate --rotate_range 10 --zoom --zoom_range 0.2 --translate --translate_range 5 --coord_normalize --gcn --n_joints 7 --fname_index 0 --joint_index 1 --symmetric_joints "[[2, 4], [1, 5], [0, 6]]" --opt Adam
@ShaoNeilz are you solved it? I have the same trouble.
Error happened when I tried to run
train_lsp.sh
file, it saidcuda error or invalid device: invalid device ordinal
I use Cygwin on windows 10. GPU is GTX 1070, and both cuda and cudnn works well.
Do I need 2 GPU to run alexnet?
Thank you very much.
@mitmul