reedscot / cvpr2016

Learning Deep Representations of Fine-grained Visual Descriptions
http://arxiv.org/abs/1605.05395
MIT License
334 stars 97 forks source link

why it runs on GPU 3 which doesn't exist at all? #3

Closed SeekPoint closed 8 years ago

SeekPoint commented 8 years ago

I got 2 gtx1080 at my workstation, no matter how I set CUDA_VISIBLES_DEVICES, it always runs on GPU 3 which doesn't exist at all.

rzai@rzai00:~/prj/cvpr2016$ CUDA_VISIBLES_DEVICES=0 th train_sje_hybrid.lua -data_dir /media/rzai/ai_data/_reedscot/de_cub_txt.tar.gz/cub_txt -image_dir /media/rzai/ai_data/_reedscot/de_cvpr2016_cub.tar.gz/images -ids_file /media/rzai/ai_data/_reedscot/de_cvpr2016_cub.tar.gz/trainvalids.txt -learning_rate 0.0007 -symmetric 1 -max_epochs 200 -savefile sje_cub_c10_hybrid -num_caption 10 -gpuid 3 -print_every 10 { image_dir : "/media/rzai/ai_data/_reedscot/de_cvpr2016_cub.tar.gz/images" seed : 123 batch_size : 40 num_caption : 10 gpuid : 3 symmetric : 1 emb_dim : 1024 image_noop : 1 checkpoint_dir : "cv" bidirectional : 0 randomize_pair : 0 max_epochs : 200 savefile : "sje_cub_c10_hybrid" print_every : 10 data_dir : "/media/rzai/ai_data/_reedscot/de_cub_txt.tar.gz/cub_txt" image_dim : 1024 init_from : "" doc_length : 201 learning_rate_decay_after : 1 grad_clip : 5 avg : 0 eval_val_every : 1000 ids_file : "/media/rzai/ai_data/_reedscot/de_cvpr2016_cub.tar.gz/trainvalids.txt" nclass : 200 cnn_dim : 256 dropout : 0 learning_rate : 0.0007 learning_rate_decay : 0.98 flip : 0 } using CUDA on GPU 3...
THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-6130/cutorch/init.c line=719 error=10 : invalid device ordinal /home/rzai/torch/install/bin/luajit: train_sje_hybrid.lua:69: cuda runtime error (10) : invalid device ordinal at /tmp/luarocks_cutorch-scm-1-6130/cutorch/init.c:719 stack traceback: [C]: in function 'setDevice' train_sje_hybrid.lua:69: in main chunk [C]: in function 'dofile' ...rzai/torch/install

reedscot commented 8 years ago

I'm guessing you need to set the "gpuid" flag to something valid on your machine.

On Tue, Nov 8, 2016 at 1:54 AM, yk_data notifications@github.com wrote:

I got 2 gtx1080 at my workstation, no matter how I set CUDA_VISIBLES_DEVICES, it always runs on GPU 3 which doesn't exist at all.

rzai@rzai00:~/prj/cvpr2016$ CUDA_VISIBLES_DEVICES=0 th train_sje_hybrid.lua -data_dir /media/rzai/ai_data/_reedscot/de_cub_txt.tar.gz/cub_txt -image_dir /media/rzai/ai_data/_reedscot/de_cvpr2016_cub.tar.gz/images -ids_file /media/rzai/ai_data/_reedscot/de_cvpr2016_cub.tar.gz/trainvalids.txt -learning_rate 0.0007 -symmetric 1 -max_epochs 200 -savefile sje_cub_c10_hybrid -num_caption 10 -gpuid 3 -print_every 10 { image_dir : "/media/rzai/ai_data/_reedscot/de_cvpr2016_cub.tar.gz/images" seed : 123 batch_size : 40 num_caption : 10 gpuid : 3 symmetric : 1 emb_dim : 1024 image_noop : 1 checkpoint_dir : "cv" bidirectional : 0 randomize_pair : 0 max_epochs : 200 savefile : "sje_cub_c10_hybrid" print_every : 10 data_dir : "/media/rzai/ai_data/_reedscot/de_cub_txt.tar.gz/cub_txt" image_dim : 1024 init_from : "" doc_length : 201 learning_rate_decay_after : 1 grad_clip : 5 avg : 0 eval_val_every : 1000 ids_file : "/media/rzai/ai_data/_reedscot/de_cvpr2016_cub.tar. gz/trainvalids.txt" nclass : 200 cnn_dim : 256 dropout : 0 learning_rate : 0.0007 learning_rate_decay : 0.98 flip : 0 } using CUDA on GPU 3... THCudaCheck FAIL file=/tmp/luarocks_cutorch-scm-1-6130/cutorch/init.c line=719 error=10 : invalid device ordinal /home/rzai/torch/install/bin/luajit: train_sje_hybrid.lua:69: cuda runtime error (10) : invalid device ordinal at /tmp/luarocks_cutorch-scm-1- 6130/cutorch/init.c:719 stack traceback: [C]: in function 'setDevice' train_sje_hybrid.lua:69: in main chunk [C]: in function 'dofile' ...rzai/torch/install

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/reedscot/cvpr2016/issues/3, or mute the thread https://github.com/notifications/unsubscribe-auth/AAU-3Qk-5oHFdrRddX--5Ip6dNGqnKQUks5q79ZqgaJpZM4Kr8ht .

SeekPoint commented 8 years ago

oh , sorry, I forget the last parameter...