weiliu89 / caffe

Caffe: a fast open framework for deep learning.
http://caffe.berkeleyvision.org/
Other
4.77k stars 1.68k forks source link

What's invalid argument? #71

Open leejiajun opened 8 years ago

leejiajun commented 8 years ago

I got this error when I executed python examples/ssd/score_ssd_pascal.py. What happened?

F0729 11:31:27.563374 18123 syncedmem.cpp:56] Check failed: error == cudaSuccess (11 vs. 0) invalid argument * Check failure stack trace: * @ 0xb674e060 (unknown) @ 0xb674df5c (unknown) @ 0xb674db78 (unknown) @ 0xb674ff98 (unknown) @ 0xb6a43c2a caffe::SyncedMemory::mutable_gpu_data() @ 0xb694ed7c caffe::Blob<>::mutable_gpu_data() @ 0xb6a80ff2 caffe::ConvolutionLayer<>::Forward_gpu() @ 0xb6a1ed8e caffe::Net<>::ForwardFromTo() @ 0xb6a1efa4 caffe::Net<>::Forward() @ 0xb6a32a34 caffe::Solver<>::TestDetection() @ 0xb6a332c0 caffe::Solver<>::TestAll() @ 0xb6a33940 caffe::Solver<>::Solve() @ 0xef1e train() @ 0xd65e main @ 0xb657c632 (unknown) Aborted

weiliu89 commented 8 years ago

Could you make clean and try to make everything again? Sometimes it happens to me as well

leejiajun commented 8 years ago

I notice that

[ RUN      ] MultiBoxLossLayerTest/0.TestLocGradient
./include/caffe/test/test_gradient_check_util.hpp:175: Failure
The difference between computed_gradient and estimated_gradient is 1219.418701171875, which exceeds threshold_ * scale, where
computed_gradient evaluates to 0,
estimated_gradient evaluates to -1219.418701171875, and
threshold_ * scale evaluates to 12.194187164306641.
debug: (top_id, top_data_id, blob_id, feat_id)=0,0,0,163; feat = -2.7519083023071289; objective+ = 73.165115356445312; objective- = 97.553489685058594
./include/caffe/test/test_gradient_check_util.hpp:175: Failure
The difference between computed_gradient and estimated_gradient is 1219.418701171875, which exceeds threshold_ * scale, where
computed_gradient evaluates to 0,
estimated_gradient evaluates to -1219.418701171875, and
threshold_ * scale evaluates to 12.194187164306641.
debug: (top_id, top_data_id, blob_id, feat_id)=0,0,0,179; feat = -2.7519083023071289; objective+ = 73.165115356445312; objective- = 97.553489685058594
./include/caffe/test/test_gradient_check_util.hpp:175: Failure
The difference between computed_gradient and estimated_gradient is 1219.4195556640625, which exceeds threshold_ * scale, where
computed_gradient evaluates to 0,
estimated_gradient evaluates to -1219.4195556640625, and
threshold_ * scale evaluates to 12.194195747375488.
debug: (top_id, top_data_id, blob_id, feat_id)=0,0,0,163; feat = -2.7519083023071289; objective+ = 73.1651611328125; objective- = 97.553550720214844
./include/caffe/test/test_gradient_check_util.hpp:175: Failure
The difference between computed_gradient and estimated_gradient is 1219.4195556640625, which exceeds threshold_ * scale, where
computed_gradient evaluates to 0,
estimated_gradient evaluates to -1219.4195556640625, and
threshold_ * scale evaluates to 12.194195747375488.
debug: (top_id, top_data_id, blob_id, feat_id)=0,0,0,179; feat = -2.7519083023071289; objective+ = 73.1651611328125; objective- = 97.553550720214844
[  FAILED  ] MultiBoxLossLayerTest/0.TestLocGradient, where TypeParam = caffe::CPUDevice<float> (15510 ms)

Does FAILED RUN have any effect?

weiliu89 commented 8 years ago

It shouldn't have such error. Did you pull the latest code? What is your environment and configuration?

leejiajun commented 8 years ago

run on TK1.

zimenglan-sysu-512 commented 8 years ago

@weiliu89 i make runtest -j8, and find such errors as below:

[----------] Global test environment tear-down
[==========] 2262 tests from 298 test cases ran. (399567 ms total)
[  PASSED  ] 2261 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] MultiBoxLossLayerTest/2.TestConfGradient, where TypeParam = caffe::GPUDevice<float>

 1 FAILED TEST
make: *** [runtest] Error 1

Does FAILED TEST have any effect?

thanks.

weiliu89 commented 8 years ago

@zimenglan-sysu-512 Probably that is floating precision error?

aurotripathy commented 8 years ago

I too have this issue. Have not done a make clean and retried.

[----------] Global test environment tear-down
[==========] 2274 tests from 298 test cases ran. (482302 ms total)
[  PASSED  ] 2273 tests.
[  FAILED  ] 1 test, listed below:
[  FAILED  ] MultiBoxLossLayerTest/1.TestConfGradient, where TypeParam = caffe::CPUDevice<double>

 1 FAILED TEST
make: *** [runtest] Error 1
(tensorflow)tempuser@tempuser-All-Series:caffe$