Open hoebd opened 6 years ago
hi, a pretrained googlenet model needs to be downloaded first. this requires the cURL library to be installed. On Thu, Jan 25, 2018 at 7:49 AM hoebd notifications@github.com wrote:
Hi, I tried to retrain GoogleNet and tested it with the default images in res/images. When I execute "./bin/train --model googlenet --folder res/images --layer pool5/7x7_s1" I get the following error: CNN Training Example
E0125 16:45:49.228271 18294 common_gpu.cc:70] Found an unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. I will set the available devices to be zero. optimizer: adam device: cudnn using cuda: true dump-model: false model: googlenet layer: pool5/7x7_s1 image-dir: res/images db-type: leveldb size: 224 iters: 1000 test-runs: 50 batch: 64 lr: 0.0001 display: false reshape: false matrix: false
3 labels found: 0: stapler #49 1: cat #42 https://github.com/leonardvandriel/caffe2_cpp_tutorial/issues/42 2: dog #32 https://github.com/leonardvandriel/caffe2_cpp_tutorial/pull/32 123 files found terminate called after throwing an instance of 'caffe2::EnforceNotMet' what(): [enforce fail at keeper.h:123] . model download not supported, install cURL Aborted at 1516895149 (unix time) try "date -d @1516895149" if you are using GNU date PC: @ 0x7f4a2b2a8428 gsignal SIGABRT (@0x3e800004776) received by PID 18294 (TID 0x7f4a3b2e5fc0) from PID 18294; stack trace: @ 0x7f4a34af0390 (unknown) @ 0x7f4a2b2a8428 gsignal @ 0x7f4a2b2aa02a abort @ 0x7f4a2bbeb84d gnu_cxx::verbose_terminate_handler() @ 0x7f4a2bbe96b6 (unknown) @ 0x7f4a2bbe9701 std::terminate() @ 0x7f4a2bbe9919 cxa_throw @ 0x5722fc caffe2::Keeper::download() @ 0x5723c3 caffe2::Keeper::ensureFile() @ 0x5724e4 caffe2::Keeper::ensureModel() @ 0x57254a caffe2::Keeper::addTrainedModel() @ 0x572f22 caffe2::Keeper::AddModel() @ 0x557815 caffe2::run() @ 0x559f66 main @ 0x7f4a2b293830 libc_start_main @ 0x551229 _start @ 0x0 (unknown) Abgebrochen (Speicherabzug geschrieben)
Could someone help me with this error please?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/leonardvandriel/caffe2_cpp_tutorial/issues/42, or mute the thread https://github.com/notifications/unsubscribe-auth/AAS4Xz_9fpxuJJMbhyYEPWEITnmqoJPDks5tOKKOgaJpZM4RtDky .
Hi, thank you very much for your fast reply. I have already downloaded the googlenet model and curl is installed. Sorry if you have been confused about my first error message with the missing model.
This second issue seems to be related to the warning that is displayed at the top ending in I will set the available devices to be zero
. Upon running the model, it encounters an error from within the CUDA runtime (see common_gpu.cc). According to the docs error 30 indicates that an unknown internal error has occurred
, which is arguably not very helpful. I can't say what the underlying problem is, but I'm fairly sure it's a general issue with your setup, not related to this repo. Did you get any Caffe2 or CUDA demo's to run?
Yes you are right, it seemed to be a nvidia driver issue. I reinstalled the nvidia driver and successfully rebuilt caffe2, but now I get another error.
## CNN Training Example ##
optimizer: adam
device: cudnn
using cuda: true
dump-model: false
model: googlenet
layer: pool5/7x7_s1
image-dir: res/images
db-type: leveldb
size: 224
iters: 1000
test-runs: 50
batch: 64
lr: 0.0001
display: false
reshape: false
matrix: false
2 labels found:
0: cat #2
1: dog #2
4 files found
split model.. (at pool5/7x7_s1)
4 images cached
training..
terminate called after throwing an instance of 'caffe2::EnforceNotMet'
what(): [enforce fail at tensor.h:671] i < dims_.size(). 0 vs 0. Exceeding ndim limit Error from operator:
output: "loss3/classifier_w" type: "XavierFill" device_option { device_type: 1 }
** while accessing output: loss3/classifier_w
*** Aborted at 1516970976 (unix time) try "date -d @1516970976" if you are using GNU date ***
PC: @ 0x7f05ca754428 gsignal
*** SIGABRT (@0x3e800005ca9) received by PID 23721 (TID 0x7f05da7ecec0) from PID 23721; stack trace: ***
@ 0x7f05d3f9c390 (unknown)
@ 0x7f05ca754428 gsignal
@ 0x7f05ca75602a abort
@ 0x7f05cb09784d __gnu_cxx::__verbose_terminate_handler()
@ 0x7f05cb0956b6 (unknown)
@ 0x7f05cb095701 std::terminate()
@ 0x7f05cb095969 __cxa_rethrow
@ 0x5fb54f caffe2::Operator<>::Run()
@ 0x7f05d9d17378 caffe2::SimpleNet::Run()
@ 0x7f05d9c9c75a caffe2::Workspace::RunNetOnce()
@ 0x551755 caffe2::run_trainer()
@ 0x558cc1 caffe2::run()
@ 0x559f66 main
@ 0x7f05ca73f830 __libc_start_main
@ 0x551229 _start
@ 0x0 (unknown)
Thanks for persisting here. This is indeed a bug. I'll take a look today.
I pushed a fix in commit 94882795. Let me know if that works.
Hi, I tried to retrain GoogleNet and tested it with the default images in res/images. When I execute "./bin/train --model googlenet --folder res/images --layer pool5/7x7_s1" I get the following error:
CNN Training Example
E0125 17:12:00.842572 18837 common_gpu.cc:70] Found an unknown error - this may be due to an incorrectly set up environment, e.g. changing env variable CUDA_VISIBLE_DEVICES after program start. I will set the available devices to be zero. optimizer: adam device: cudnn using cuda: true dump-model: false model: googlenet layer: pool5/7x7_s1 image-dir: res/images db-type: leveldb size: 224 iters: 1000 test-runs: 50 batch: 64 lr: 0.0001 display: false reshape: false matrix: false
2 labels found:
0: cat #2 1: dog #2 4 files found split model.. (at pool5/7x7_s1) terminate called after throwing an instance of 'caffe2::EnforceNotMet' what(): [enforce fail at common_gpu.cc:132] error == cudaSuccess. 30 vs 0. Error at: /home/daniel/caffe2/caffe2/core/common_gpu.cc:132: unknown error Aborted at 1516896720 (unix time) try "date -d @1516896720" if you are using GNU date PC: @ 0x7ff6526ad428 gsignal SIGABRT (@0x3e800004995) received by PID 18837 (TID 0x7ff6626eafc0) from PID 18837; stack trace: @ 0x7ff65bef5390 (unknown) @ 0x7ff6526ad428 gsignal @ 0x7ff6526af02a abort @ 0x7ff652ff084d gnu_cxx::verbose_terminate_handler() @ 0x7ff652fee6b6 (unknown) @ 0x7ff652fee701 std::terminate() @ 0x7ff652fee969 cxa_rethrow @ 0x7ff661c0a835 caffe2::CreateOperator() @ 0x7ff661c5f080 caffe2::SimpleNet::SimpleNet() @ 0x7ff661c434d6 caffe2::CreateNet() @ 0x7ff661c43c9d caffe2::CreateNet() @ 0x7ff661be11e2 caffe2::Workspace::RunNetOnce() @ 0x5550bf caffe2::preprocess() @ 0x557d8f caffe2::run() @ 0x559f66 main @ 0x7ff652698830 libc_start_main @ 0x551229 _start @ 0x0 (unknown) Abgebrochen (Speicherabzug geschrieben)
Could someone help me with this error please?