mathmanu / caffe-jacinto-models

This repository has moved. The new link can be obtained from https://github.com/TexasInstruments/jacinto-ai-devkit
63 stars 47 forks source link

terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector<boost::lock_error> >' what(): boost: mutex lock failed in pthread_mutex_lock: Invalid argument #15

Open dzl94 opened 4 years ago

dzl94 commented 4 years ago

Hi, when I try to train my models, problems happened in the end of every stage. I mean 'initial'、'l1reg'、‘sparse’ and so on. The main work of every stage seems to be done, but no result charts saved compared to the example stored in the './trained' fold. The problems is like due to the multi-threads. the run log shows as follows.

I0908 09:01:46.720330 7901 caffe.cpp:268] Solver performance on device 0: 1.667 * 32 = 53.33 img/sec (6 itr in 2.4 sec) I0908 09:01:46.720353 7901 caffe.cpp:271] Optimization Done in 16s terminate called after throwing an instance of 'boost::exception_detail::clone_impl<boost::exception_detail::error_info_injector >' what(): boost: mutex lock failed in pthread_mutex_lock: Invalid argument Aborted at 1567904507 (unix time) try "date -d @1567904507" if you are using GNU date PC: @ 0x7fe0f0f8ae97 gsignal SIGABRT (@0x3e800001edd) received by PID 7901 (TID 0x7fe079fff700) from PID 7901; stack trace: @ 0x7fe0f0f8af20 (unknown) @ 0x7fe0f0f8ae97 gsignal @ 0x7fe0f0f8c801 abort @ 0x7fe0f1d9e957 (unknown) @ 0x7fe0f1da4ab6 (unknown) @ 0x7fe0f1da4af1 std::terminate() @ 0x7fe0f1da4d24 __cxa_throw @ 0x7fe0f3389734 boost::throw_exception<>() @ 0x7fe0f33898c7 boost::unique_lock<>::lock() @ 0x7fe0f37c51af caffe::BlockingQueue<>::push() @ 0x7fe0f3484e76 caffe::AnnotatedDataLayer<>::load_batch() @ 0x7fe0f3748656 caffe::BasePrefetchingDataLayer<>::InternalThreadEntryN() @ 0x7fe0f33688c6 caffe::InternalThread::entry() @ 0x7fe0f336b03b boost::detail::thread_data<>::run() @ 0x7fe0e6f92bcd (unknown) @ 0x7fe0d0ade6db start_thread @ 0x7fe0f106d88f clone

mathmanu commented 4 years ago

Can you try the following:

  1. Temporarily switch to teh the git branch caffe-0.16 for both caffe-jacinto and caffe-jacinto models.
  2. Then clean build caffe-jacinto
  3. see if your training completes.
WeiChihChern commented 4 years ago

@mathmanu same issue here