Closed SeekPoint closed 8 years ago
I'm guessing you need to set the "gpuid" flag to something valid on your machine.
On Tue, Nov 8, 2016 at 1:54 AM, yk_data notifications@github.com wrote:
I got 2 gtx1080 at my workstation, no matter how I set CUDA_VISIBLES_DEVICES, it always runs on GPU 3 which doesn't exist at all.
rzai@rzai00:~/prj/cvpr2016$ CUDA_VISIBLES_DEVICES=0 th train_sje_hybrid.lua -data_dir /media/rzai/ai_data/_reedscot/de_cub_txt.tar.gz/cub_txt -image_dir /media/rzai/ai_data/_reedscot/de_cvpr2016_cub.tar.gz/images -ids_file /media/rzai/ai_data/_reedscot/de_cvpr2016_cub.tar.gz/trainvalids.txt -learning_rate 0.0007 -symmetric 1 -max_epochs 200 -savefile sje_cub_c10_hybrid -num_caption 10 -gpuid 3 -print_every 10 { image_dir : "/media/rzai/ai_data/_reedscot/de_cvpr2016_cub.tar.gz/images" seed : 123 batch_size : 40 num_caption : 10 gpuid : 3 symmetric : 1 emb_dim : 1024 image_noop : 1 checkpoint_dir : "cv" bidirectional : 0 randomize_pair : 0 max_epochs : 200 savefile : "sje_cub_c10_hybrid" print_every : 10 data_dir : "/media/rzai/ai_data/_reedscot/de_cub_txt.tar.gz/cub_txt" image_dim : 1024 init_from : "" doc_length : 201 learning_rate_decay_after : 1 grad_clip : 5 avg : 0 eval_val_every : 1000 ids_file : "/media/rzai/ai_data/_reedscot/de_cvpr2016_cub.tar. gz/trainvalids.txt" nclass : 200 cnn_dim : 256 dropout : 0 learning_rate : 0.0007 learning_rate_decay : 0.98 flip : 0 } using CUDA on GPU 3... THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-6130/cutorch/init.c line=719 error=10 : invalid device ordinal /home/rzai/torch/install/bin/luajit: train_sje_hybrid.lua:69: cuda runtime error (10) : invalid device ordinal at /tmp/luarocks_cutorch-scm-1- 6130/cutorch/init.c:719 stack traceback: [C]: in function 'setDevice' train_sje_hybrid.lua:69: in main chunk [C]: in function 'dofile' ...rzai/torch/install
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/reedscot/cvpr2016/issues/3, or mute the thread https://github.com/notifications/unsubscribe-auth/AAU-3Qk-5oHFdrRddX--5Ip6dNGqnKQUks5q79ZqgaJpZM4Kr8ht .
oh , sorry, I forget the last parameter...
I got 2 gtx1080 at my workstation, no matter how I set CUDA_VISIBLES_DEVICES, it always runs on GPU 3 which doesn't exist at all.
rzai@rzai00:~/prj/cvpr2016$ CUDA_VISIBLES_DEVICES=0 th train_sje_hybrid.lua -data_dir /media/rzai/ai_data/_reedscot/de_cub_txt.tar.gz/cub_txt -image_dir /media/rzai/ai_data/_reedscot/de_cvpr2016_cub.tar.gz/images -ids_file /media/rzai/ai_data/_reedscot/de_cvpr2016_cub.tar.gz/trainvalids.txt -learning_rate 0.0007 -symmetric 1 -max_epochs 200 -savefile sje_cub_c10_hybrid -num_caption 10 -gpuid 3 -print_every 10 { image_dir : "/media/rzai/ai_data/_reedscot/de_cvpr2016_cub.tar.gz/images" seed : 123 batch_size : 40 num_caption : 10 gpuid : 3 symmetric : 1 emb_dim : 1024 image_noop : 1 checkpoint_dir : "cv" bidirectional : 0 randomize_pair : 0 max_epochs : 200 savefile : "sje_cub_c10_hybrid" print_every : 10 data_dir : "/media/rzai/ai_data/_reedscot/de_cub_txt.tar.gz/cub_txt" image_dim : 1024 init_from : "" doc_length : 201 learning_rate_decay_after : 1 grad_clip : 5 avg : 0 eval_val_every : 1000 ids_file : "/media/rzai/ai_data/_reedscot/de_cvpr2016_cub.tar.gz/trainvalids.txt" nclass : 200 cnn_dim : 256 dropout : 0 learning_rate : 0.0007 learning_rate_decay : 0.98 flip : 0 } using CUDA on GPU 3...
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-6130/cutorch/init.c line=719 error=10 : invalid device ordinal /home/rzai/torch/install/bin/luajit: train_sje_hybrid.lua:69: cuda runtime error (10) : invalid device ordinal at /tmp/luarocks_cutorch-scm-1-6130/cutorch/init.c:719 stack traceback: [C]: in function 'setDevice' train_sje_hybrid.lua:69: in main chunk [C]: in function 'dofile' ...rzai/torch/install