robotology / yarp

YARP - Yet Another Robot Platform
http://www.yarp.it
Other
511 stars 192 forks source link

OS::PortTest failing in CI travis? #621

Open lornat75 opened 8 years ago

lornat75 commented 8 years ago

I noticed that in some PR travis CI fail with the follow error:

29/63 Test #29: OS::PortTest .......................***Exception: SegFault 3.77 sec test 30 Start 30: OS::PortReaderBufferTest

This problem seems to be unrelated with the code changed in PR and is sporadic, maybe there is a race condition.

@drdanz

traversaro commented 8 years ago

Another related failed test: https://travis-ci.org/robotology/yarp/jobs/113625737 .

traversaro commented 8 years ago

Probably related to https://github.com/robotology/WB-Toolbox/issues/11 .

traversaro commented 8 years ago

All occurrences of this bug (Travis, https://github.com/robotology/WB-Toolbox/issues/11) appear to be on OS X .

traversaro commented 8 years ago

Never mind: apparently I was able to replicate the issue on Ubuntu 16.04 , with the following command: ctest -R PortTest::Valgrind::MemCheck --repeat-until-fail 10000 --output-on-failure , with following final output:

yarp(4069cc0): Thread being deleted
yarp(4069cc0): Thread being deleted
yarp: 0 | PortTest: checking interrupt for a port with pending reply...
yarp(4069cc0): register contact /out
yarp(4069cc0): opening for address tcp://localhost:10003
yarp(b63e700): Thread starting up
yarp(4069cc0): Child thread initializing
yarp(4069cc0): Child thread initialized ok
yarp: Port /out active at tcp://localhost:10003
yarp(4069cc0): register contact /in
yarp(4069cc0): opening for address tcp://localhost:10002
yarp(4069cc0): Child thread initializing
yarp(a23b700): Thread starting up
yarp(4069cc0): Child thread initialized ok
yarp: Port /in active at tcp://localhost:10002
yarp(4069cc0): working on connection /out to /in (connect)
yarp(4069cc0): query name /out
yarp(4069cc0): Configuration file: /home/pegua/.config/yarp/yarp_namespace.conf
yarp(4069cc0): query name /in
yarp(4069cc0): Configuration file: /home/pegua/.config/yarp/yarp_namespace.conf
yarp(4069cc0): ** asking tcp://out: [add] "tcp://in"
yarp(4069cc0): scanning network interface 127.0.0.1
yarp(4069cc0): scanning network interface 192.168.1.9
yarp(4069cc0): scanning network interface 192.168.122.1
yarp(4069cc0): scanning network interface ::1
yarp(4069cc0): scanning network interface fe80::7218:8bff:fedd:bfa7%3
yarp(b63e700): /out: PortCore received something
yarp(b63e700): new input connection to /out starting
yarp(b63e700): Child thread initializing
yarp(aa3c700): Thread starting up
yarp(b63e700): Child thread initialized ok
yarp(b63e700): new input connection to /out started ok
yarp(b63e700): /out: PortCore spun off a connection
yarp(b63e700): /out: / routine check of connections to this port begins
yarp(b63e700): /out: | checking connection ->-> 
yarp(b63e700): /out: \ routine check of connections to this port ends
yarp(aa3c700): TextCarrier::expectSenderSpecifier
yarp(aa3c700): Receiving input from admin to /out using text_ack
yarp(aa3c700): /out: asked to add output to tcp://in
yarp(aa3c700): query name /in
yarp(aa3c700): Configuration file: /home/pegua/.config/yarp/yarp_namespace.conf
yarp(aa3c700): /out: asked to remove connection /out->*->/in
yarp(aa3c700): scanning network interface 127.0.0.1
yarp(aa3c700): scanning network interface 192.168.1.9
yarp(aa3c700): scanning network interface 192.168.122.1
yarp(aa3c700): scanning network interface ::1
yarp(aa3c700): scanning network interface fe80::7218:8bff:fedd:bfa7%3
yarp(a23b700): /in: PortCore received something
yarp(a23b700): new input connection to /in starting
yarp(a23b700): Child thread initializing
yarp(be3f700): Thread starting up
yarp(a23b700): Child thread initialized ok
yarp(a23b700): new input connection to /in started ok
yarp(a23b700): /in: PortCore spun off a connection
yarp(a23b700): /in: / routine check of connections to this port begins
yarp(a23b700): /in: | checking connection ->-> 
yarp(a23b700): /in: \ routine check of connections to this port ends
yarp(be3f700): Port /in starting up, flushing routes /out->*->/in
yarp(be3f700): /in: asked to remove connection /out->*->/in
yarp: Receiving input from /out to /in using tcp
yarp(be3f700): expecting an index
yarp(be3f700): ConnectionState::expectIndex for /out->tcp->/in
yarp: Sending output from /out to /in using tcp
yarp(aa3c700): /out: / routine check of connections to this port begins
yarp(aa3c700): /out: | checking connection admin->text_ack->/out 
yarp(aa3c700): /out: | checking connection /out->tcp->/in 
yarp(aa3c700): /out: \ routine check of connections to this port ends
yarp(aa3c700): bad socket read
yarp(4069cc0): Child thread initializing
yarp(aa3c700): PortCoreInputUnit closing ip
yarp(aa3c700): PortCoreInputUnit closed ip
yarp(aa3c700): PortCoreInputUnit (unrooted) shutting down
yarp(aa3c700): Thread shutting down
yarp(923a700): Thread starting up
yarp(4069cc0): Child thread initialized ok
yarp(be3f700): index coming in happily...
yarp(be3f700): secondary header came in happily...
yarp(be3f700): Total message length: 40
yarp: 0 | PortTest:   [cmd received ok] passed ok
yarp(be3f700): sending an acknowledgment
yarp(4069cc0): /in: now preparing to shut down port
yarp(4069cc0): ^^^^^^^^^^^ interrupting socket
yarp(4069cc0): ^^^^^^^^^^^ interrupting socket reader
yarp(4069cc0): ^^^^^^^^^^^ interrupting socket writer
yarp(4069cc0): ^^^^^^^^^^^ interrupting socket fully
yarp(4069cc0): /in: requesting removal of connection from /out
yarp(4069cc0): working on connection /out to /in (disconnect)
yarp(4069cc0): query name /out
yarp(4069cc0): Configuration file: /home/pegua/.config/yarp/yarp_namespace.conf
yarp(4069cc0): query name /in
yarp(923a700): bad socket read
yarp(923a700): did not get acknowledgement header
yarp(923a700): did not get acknowledgement header
yarp: Removing output from /out to /in
==16265== Thread 6:
==16265== Invalid read of size 8
==16265==    at 0x5E785EF: yarp::os::impl::PortCoreOutputUnit::sendHelper() (PortCoreOutputUnit.cpp:333)
==16265==    by 0x5E789B3: yarp::os::impl::PortCoreOutputUnit::send(yarp::os::PortWriter&, yarp::os::PortReader*, yarp::os::PortWriter*, void*, yarp::os::ConstString const&, bool, bool, bool*) (PortCoreOutputUnit.cpp:377)
==16265==    by 0x5E671E4: yarp::os::impl::PortCore::sendHelper(yarp::os::PortWriter&, int, yarp::os::PortReader*, yarp::os::PortWriter*) (PortCore.cpp:1333)
==16265==    by 0x5E66DD5: yarp::os::impl::PortCore::send(yarp::os::PortWriter&, yarp::os::PortReader*, yarp::os::PortWriter*) (PortCore.cpp:1255)
==16265==    by 0x5E7C590: yarp::os::Port::write(yarp::os::PortWriter&, yarp::os::PortReader&, yarp::os::PortWriter*) const (Port.cpp:787)
==16265==    by 0x4639D3: ServiceUser::run() (PortTest.cpp:200)
==16265==    by 0x5ECB66A: ThreadCallbackAdapter::run() (Thread.cpp:33)
==16265==    by 0x5ECB976: theExecutiveBranch(void*) (ThreadImpl.cpp:94)
==16265==    by 0x4F37F55: ACE_OS_Thread_Adapter::invoke() (in /usr/lib/libACE-6.3.3.so)
==16265==    by 0x51DB6F9: start_thread (pthread_create.c:333)
==16265==    by 0x67F2B5C: clone (clone.S:109)
==16265==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==16265== 
==16265== 
==16265== Process terminating with default action of signal 11 (SIGSEGV)
==16265==  Access not within mapped region at address 0x0
==16265==    at 0x5E785EF: yarp::os::impl::PortCoreOutputUnit::sendHelper() (PortCoreOutputUnit.cpp:333)
==16265==    by 0x5E789B3: yarp::os::impl::PortCoreOutputUnit::send(yarp::os::PortWriter&, yarp::os::PortReader*, yarp::os::PortWriter*, void*, yarp::os::ConstString const&, bool, bool, bool*) (PortCoreOutputUnit.cpp:377)
==16265==    by 0x5E671E4: yarp::os::impl::PortCore::sendHelper(yarp::os::PortWriter&, int, yarp::os::PortReader*, yarp::os::PortWriter*) (PortCore.cpp:1333)
==16265==    by 0x5E66DD5: yarp::os::impl::PortCore::send(yarp::os::PortWriter&, yarp::os::PortReader*, yarp::os::PortWriter*) (PortCore.cpp:1255)
==16265==    by 0x5E7C590: yarp::os::Port::write(yarp::os::PortWriter&, yarp::os::PortReader&, yarp::os::PortWriter*) const (Port.cpp:787)
==16265==    by 0x4639D3: ServiceUser::run() (PortTest.cpp:200)
==16265==    by 0x5ECB66A: ThreadCallbackAdapter::run() (Thread.cpp:33)
==16265==    by 0x5ECB976: theExecutiveBranch(void*) (ThreadImpl.cpp:94)
==16265==    by 0x4F37F55: ACE_OS_Thread_Adapter::invoke() (in /usr/lib/libACE-6.3.3.so)
==16265==    by 0x51DB6F9: start_thread (pthread_create.c:333)
==16265==    by 0x67F2B5C: clone (clone.S:109)
==16265==  If you believe this happened as a result of a stack
==16265==  overflow in your program's main thread (unlikely but
==16265==  possible), you can try to increase the size of the
==16265==  main thread stack using the --main-stacksize= flag.
==16265==  The main thread stack size used in this run was 8388608.
Waiting...
Writing bottle 1: hello world
Writing bottle 2: 2 3 5 7 11
Bottle 1 is: hello world
Bottle 2 is: 2 3 5 7 11
Writing bottle 1: hello world
Writing bottle 2: 2 3 5 7 11
Bottle 2 is: 2 3 5 7 11
==16265== 
==16265== HEAP SUMMARY:
==16265==     in use at exit: 2,105,043 bytes in 10,201 blocks
==16265==   total heap usage: 818,045 allocs, 807,844 frees, 202,921,227 bytes allocated
==16265== 
==16265== Thread 1:
==16265== 288 bytes in 1 blocks are possibly lost in loss record 513 of 604
==16265==    at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16265==    by 0x40136D4: allocate_dtv (dl-tls.c:322)
==16265==    by 0x40136D4: _dl_allocate_tls (dl-tls.c:539)
==16265==    by 0x51DC2AE: allocate_stack (allocatestack.c:588)
==16265==    by 0x51DC2AE: pthread_create@@GLIBC_2.2.5 (pthread_create.c:539)
==16265==    by 0x4F36964: ACE_OS::thr_create(void* (*)(void*), void*, long, unsigned long*, unsigned long*, long, void*, unsigned long, ACE_Base_Thread_Adapter*, char const**) (in /usr/lib/libACE-6.3.3.so)
==16265==    by 0x5ECC81B: ACE_Thread::spawn(void* (*)(void*), void*, long, unsigned long*, unsigned long*, long, void*, unsigned long, ACE_Thread_Adapter*, char const**) (Thread.inl:96)
==16265==    by 0x5ECC07B: yarp::os::impl::ThreadImpl::start() (ThreadImpl.cpp:253)
==16265==    by 0x5ECB40E: yarp::os::Thread::start() (Thread.cpp:84)
==16265==    by 0x46E641: PortTest::testInterruptReply() (PortTest.cpp:1245)
==16265==    by 0x4707A8: PortTest::runTests() (PortTest.cpp:1453)
==16265==    by 0x5ECE48C: yarp::os::impl::UnitTest::run(int, char**) (UnitTest.cpp:149)
==16265==    by 0x5ECE348: yarp::os::impl::UnitTest::runSubTests(int, char**) (UnitTest.cpp:123)
==16265==    by 0x5ECE4BD: yarp::os::impl::UnitTest::run(int, char**) (UnitTest.cpp:153)
==16265== 
==16265== 1,032 bytes in 1 blocks are possibly lost in loss record 553 of 604
==16265==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16265==    by 0x583C136: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581795B: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581AEA6: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x583A6F2: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5873A02: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x587D1A6: sqlite3_step (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x587E209: sqlite3_exec (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x4CABE8: TripleSourceCreator::open(char const*, bool, bool) (TripleSourceCreator.cpp:76)
==16265==    by 0x4C8178: NameServerContainer::open(yarp::os::Searchable&) (yarpserver3.cpp:116)
==16265==    by 0x4C4230: yarpserver3_create(yarp::os::Searchable&) (yarpserver3.cpp:178)
==16265==    by 0x4C0D6F: main (harness.cpp:46)
==16265== 
==16265== 1,032 bytes in 1 blocks are possibly lost in loss record 554 of 604
==16265==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16265==    by 0x583C136: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581795B: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581AEA6: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x583A6F2: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5873A02: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x587D1A6: sqlite3_step (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x587E209: sqlite3_exec (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x4D50BF: SubscriberOnSql::open(yarp::os::ConstString const&, bool) (SubscriberOnSql.cpp:67)
==16265==    by 0x4C821C: NameServerContainer::open(yarp::os::Searchable&) (yarpserver3.cpp:125)
==16265==    by 0x4C4230: yarpserver3_create(yarp::os::Searchable&) (yarpserver3.cpp:178)
==16265==    by 0x4C0D6F: main (harness.cpp:46)
==16265== 
==16265== 1,152 bytes in 4 blocks are possibly lost in loss record 569 of 604
==16265==    at 0x4C2FB55: calloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16265==    by 0x40136D4: allocate_dtv (dl-tls.c:322)
==16265==    by 0x40136D4: _dl_allocate_tls (dl-tls.c:539)
==16265==    by 0x51DC2AE: allocate_stack (allocatestack.c:588)
==16265==    by 0x51DC2AE: pthread_create@@GLIBC_2.2.5 (pthread_create.c:539)
==16265==    by 0x4F36964: ACE_OS::thr_create(void* (*)(void*), void*, long, unsigned long*, unsigned long*, long, void*, unsigned long, ACE_Base_Thread_Adapter*, char const**) (in /usr/lib/libACE-6.3.3.so)
==16265==    by 0x5ECC81B: ACE_Thread::spawn(void* (*)(void*), void*, long, unsigned long*, unsigned long*, long, void*, unsigned long, ACE_Thread_Adapter*, char const**) (Thread.inl:96)
==16265==    by 0x5ECC07B: yarp::os::impl::ThreadImpl::start() (ThreadImpl.cpp:253)
==16265==    by 0x5E72AE2: yarp::os::impl::PortCoreInputUnit::start() (PortCoreInputUnit.cpp:45)
==16265==    by 0x5E6244E: yarp::os::impl::PortCore::addInput(yarp::os::InputProtocol*) (PortCore.cpp:674)
==16265==    by 0x5E5F946: yarp::os::impl::PortCore::run() (PortCore.cpp:210)
==16265==    by 0x5ECB976: theExecutiveBranch(void*) (ThreadImpl.cpp:94)
==16265==    by 0x4F37F55: ACE_OS_Thread_Adapter::invoke() (in /usr/lib/libACE-6.3.3.so)
==16265==    by 0x51DB6F9: start_thread (pthread_create.c:333)
==16265== 
==16265== 1,304 bytes in 1 blocks are possibly lost in loss record 571 of 604
==16265==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16265==    by 0x583C136: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581795B: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581AEA6: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581B97B: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x58519C4: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5851E3A: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5852928: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5854D8F: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5858253: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x585AC34: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x587672A: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265== 
==16265== 1,304 bytes in 1 blocks are possibly lost in loss record 572 of 604
==16265==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16265==    by 0x583C136: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581795B: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581AEA6: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581B97B: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x58519C4: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5851E3A: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5852928: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5854D8F: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5858253: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x585AC34: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5875DD9: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265== 
==16265== 1,304 bytes in 1 blocks are possibly lost in loss record 573 of 604
==16265==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16265==    by 0x583C136: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581795B: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581AEA6: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581B97B: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x58519C4: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5851E3A: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5852928: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5854D8F: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5856919: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x58583D3: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x585AC34: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265== 
==16265== 2,608 bytes in 2 blocks are possibly lost in loss record 583 of 604
==16265==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16265==    by 0x583C136: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581795B: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581AEA6: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581B97B: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x58519C4: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5851E3A: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x585421B: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x587EAC1: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x587EC1B: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x587ECDF: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x587EE24: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265== 
==16265== 2,608 bytes in 2 blocks are possibly lost in loss record 584 of 604
==16265==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16265==    by 0x583C136: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581795B: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581AEA6: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581B97B: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x58519C4: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5851E3A: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5852928: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5854D8F: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5858527: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x585AC34: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x587672A: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265== 
==16265== 10,432 bytes in 8 blocks are possibly lost in loss record 591 of 604
==16265==    at 0x4C2DB8F: malloc (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==16265==    by 0x583C136: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581795B: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581AEA6: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x581B97B: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x58519C4: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5851E3A: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5852928: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5854D8F: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x58591E1: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x5874FF9: ??? (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265==    by 0x587D1A6: sqlite3_step (in /usr/lib/x86_64-linux-gnu/libsqlite3.so.0.8.6)
==16265== 
==16265== LEAK SUMMARY:
==16265==    definitely lost: 0 bytes in 0 blocks
==16265==    indirectly lost: 0 bytes in 0 blocks
==16265==      possibly lost: 23,064 bytes in 22 blocks
==16265==    still reachable: 2,081,979 bytes in 10,179 blocks
==16265==                       of which reachable via heuristic:
==16265==                         length64           : 26,776 bytes in 266 blocks
==16265==         suppressed: 0 bytes in 0 blocks
==16265== Reachable blocks (those to which a pointer was found) are not shown.
==16265== To see them, rerun with: --leak-check=full --show-leak-kinds=all
==16265== 
==16265== For counts of detected and suppressed errors, rerun with: -v
==16265== ERROR SUMMARY: 11 errors from 11 contexts (suppressed: 1 from 1)
Tobias-Fischer commented 7 years ago

I managed to get a dump, this might help for debugging. My guess is that sending in PortCoreOutputUnit should be a mutex (and probably running as well). Furthermore, the sending = true in line 375 should be moved to line 370. Maybe there is a race condition on cachedReader. Maybe @drdanz wants to fix another test ;-).

Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007fae6ba0f751 in yarp::os::impl::PortCoreOutputUnit::sendHelper (this=this@entry=0x7fae60004370) at /home/icub/robot/yarp/src/libYARP_OS/src/PortCoreOutputUnit.cpp:333
333     if(replied && op->getSender().modifiesReply()) {
[Current thread is 1 (Thread 0x7fae69405700 (LWP 7801))]

Thread 5 (Thread 0x7fae5bfff700 (LWP 7797)):
#0  0x00007fae6c5f97ad in accept () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007fae6c926511 in ACE_SOCK_Acceptor::accept(ACE_SOCK_Stream&, ACE_Addr*, ACE_Time_Value*, bool, bool) const () from /usr/lib/libACE-6.3.3.so
#2  0x00007fae6ba68742 in yarp::os::impl::SocketTwoWayStream::open (this=this@entry=0x7fae54001fb0, acceptor=...) at /home/icub/robot/yarp/src/libYARP_OS/src/SocketTwoWayStream.cpp:64
#3  0x00007fae6ba6dbd5 in yarp::os::impl::TcpFace::read (this=0x234ac40) at /home/icub/robot/yarp/src/libYARP_OS/src/TcpFace.cpp:87
#4  0x00007fae6b9f8f4d in yarp::os::impl::PortCore::run (this=0x2347ff0) at /home/icub/robot/yarp/src/libYARP_OS/src/PortCore.cpp:184
#5  0x00007fae6ba707b5 in theExecutiveBranch (args=0x2347ff0) at /home/icub/robot/yarp/src/libYARP_OS/src/ThreadImpl.cpp:96
#6  0x00007fae6c903f56 in ACE_OS_Thread_Adapter::invoke() () from /usr/lib/libACE-6.3.3.so
#7  0x00007fae6c5f06fa in start_thread (arg=0x7fae5bfff700) at pthread_create.c:333
#8  0x00007fae6b092b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 4 (Thread 0x7fae69c06700 (LWP 7796)):
#0  0x00007fae6c5f97ad in accept () at ../sysdeps/unix/syscall-template.S:84
#1  0x00007fae6c926511 in ACE_SOCK_Acceptor::accept(ACE_SOCK_Stream&, ACE_Addr*, ACE_Time_Value*, bool, bool) const () from /usr/lib/libACE-6.3.3.so
#2  0x00007fae6ba68742 in yarp::os::impl::SocketTwoWayStream::open (this=this@entry=0x7fae5c002950, acceptor=...) at /home/icub/robot/yarp/src/libYARP_OS/src/SocketTwoWayStream.cpp:64
#3  0x00007fae6ba6dbd5 in yarp::os::impl::TcpFace::read (this=0x22c8aa0) at /home/icub/robot/yarp/src/libYARP_OS/src/TcpFace.cpp:87
#4  0x00007fae6b9f8f4d in yarp::os::impl::PortCore::run (this=0x22c7f00) at /home/icub/robot/yarp/src/libYARP_OS/src/PortCore.cpp:184
#5  0x00007fae6ba707b5 in theExecutiveBranch (args=0x22c7f00) at /home/icub/robot/yarp/src/libYARP_OS/src/ThreadImpl.cpp:96
#6  0x00007fae6c903f56 in ACE_OS_Thread_Adapter::invoke() () from /usr/lib/libACE-6.3.3.so
#7  0x00007fae6c5f06fa in start_thread (arg=0x7fae69c06700) at pthread_create.c:333
#8  0x00007fae6b092b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 3 (LWP 7800):
#0  0x00007fae6c5f07be in __exit_thread () at ../sysdeps/unix/sysv/linux/exit-thread.h:36
#1  start_thread (arg=0x7fae68c04700) at pthread_create.c:478
#2  0x00007fae6b092b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109

Thread 2 (Thread 0x7fae6cd6f740 (LWP 7350)):
#0  yy_reduce (yyruleno=<optimised out>, yypParser=<optimised out>) at sqlite3.c:133175
#1  sqlite3Parser (yyp=0x234b818, yymajor=55, yyminor=..., pParse=0x234b568) at sqlite3.c:3186
#2  0x00007fae6bf87aa4 in sqlite3RunParser (pParse=0x234b568, zSql=0x22c9918 "SELECT id, ns, name, value FROM tags WHERE rid = 188 AND ns IS NULL AND name = 'host'", pzErrMsg=0x7ffe416253b0)
    at sqlite3.c:135186
#3  0x00007fae6bf89c26 in sqlite3Prepare (db=0x221ef98, zSql=0x22c9918 "SELECT id, ns, name, value FROM tags WHERE rid = 188 AND ns IS NULL AND name = 'host'", nBytes=-1, saveSqlFlag=1, 
    pReprepare=<optimised out>, ppStmt=0x7ffe416254a8, pzTail=0x0) at sqlite3.c:112596
#4  0x00007fae6bf89d3c in sqlite3LockAndPrepare (db=0x221ef98, zSql=0x22c9918 "SELECT id, ns, name, value FROM tags WHERE rid = 188 AND ns IS NULL AND name = 'host'", nBytes=-1, saveSqlFlag=1, pOld=0x0, 
    ppStmt=0x7ffe416254a8, pzTail=0x0) at sqlite3.c:112688
#5  0x00007fae6bf8a008 in sqlite3_prepare_v2 (db=<optimised out>, zSql=<optimised out>, zSql@entry=0x22c9918 "SELECT id, ns, name, value FROM tags WHERE rid = 188 AND ns IS NULL AND name = 'host'", 
    nBytes=<optimised out>, nBytes@entry=-1, ppStmt=<optimised out>, ppStmt@entry=0x7ffe416254a8, pzTail=<optimised out>, pzTail@entry=0x0) at sqlite3.c:112764
#6  0x00000000004d78e2 in SqliteTripleSource::query[abi:cxx11](Triple&, TripleContext*) (this=0x2232590, ti=..., context=<optimised out>)
    at /home/icub/robot/yarp/src/libYARP_serversql/src/SqliteTripleSource.h:141
#7  0x00000000004d911c in NameServiceOnTriples::query (this=this@entry=0x221eac0, portName=..., act=..., prefix=..., nested=nested@entry=false)
    at /home/icub/robot/yarp/src/libYARP_serversql/src/NameServiceOnTriples.cpp:50
#8  0x00000000004da2b9 in NameServiceOnTriples::cmdQuery (this=this@entry=0x221eac0, act=..., nested=nested@entry=false) at /home/icub/robot/yarp/src/libYARP_serversql/src/NameServiceOnTriples.cpp:133
#9  0x00000000004ddf87 in NameServiceOnTriples::apply (this=0x221eac0, cmd=..., reply=..., event=..., remote=...) at /home/icub/robot/yarp/src/libYARP_serversql/src/NameServiceOnTriples.cpp:627
#10 0x00000000004d3573 in yarp::name::NameServerConnectionHandler::apply (this=0x7ffe41626330, reader=..., writer=0x2349ac0, lock=<optimised out>)
    at /home/icub/robot/yarp/src/libYARP_name/include/yarp/name/NameServerConnectionHandler.h:58
#11 0x00007fae6c1e3a19 in yarp::name::NameService::process (this=<optimised out>, in=..., out=..., remote=...) at /home/icub/robot/yarp/src/libYARP_name/src/NameService.cpp:23
#12 0x00007fae6b9d769b in yarp::os::NetworkBase::writeToNameServer (cmd=..., reply=..., style=...) at /home/icub/robot/yarp/src/libYARP_OS/src/Network.cpp:1448
#13 0x00007fae6b9bd4b5 in yarp::os::impl::NameClient::send (this=this@entry=0x22c66c0, cmd=..., multi=multi@entry=true) at /home/icub/robot/yarp/src/libYARP_OS/src/NameClient.cpp:141
#14 0x00007fae6b9beb29 in yarp::os::impl::NameClient::probe (cmd=..., this=0x22c66c0) at /home/icub/robot/yarp/src/libYARP_OS/include/yarp/os/impl/NameClient.h:132
#15 yarp::os::impl::NameClient::queryName (this=0x22c66c0, name=...) at /home/icub/robot/yarp/src/libYARP_OS/src/NameClient.cpp:279
#16 0x00007fae6ba763b1 in yarp::os::YarpNameSpace::queryName (this=<optimised out>, name=...) at /home/icub/robot/yarp/src/libYARP_OS/src/YarpNameSpace.cpp:36
#17 0x00007fae6b9b667a in MultiNameSpaceHelper::queryName (name=..., this=0x224cb80) at /home/icub/robot/yarp/src/libYARP_OS/src/MultiNameSpace.cpp:173
#18 yarp::os::MultiNameSpace::queryName (this=<optimised out>, name=...) at /home/icub/robot/yarp/src/libYARP_OS/src/MultiNameSpace.cpp:253
#19 0x00007fae6b9d6f97 in yarp::os::NetworkBase::queryName (name=...) at /home/icub/robot/yarp/src/libYARP_OS/src/Network.cpp:683
#20 0x00007fae6b9dac98 in metaConnect (src=..., dest=..., style=..., mode=mode@entry=2) at /home/icub/robot/yarp/src/libYARP_OS/src/Network.cpp:296
#21 0x00007fae6b9dc140 in yarp::os::NetworkBase::disconnect (src=..., dest=..., style=...) at /home/icub/robot/yarp/src/libYARP_OS/src/Network.cpp:542
#22 0x00007fae6b9dc1e1 in yarp::os::NetworkBase::disconnect (src=..., dest=..., quiet=quiet@entry=true) at /home/icub/robot/yarp/src/libYARP_OS/src/Network.cpp:536
#23 0x00007fae6b981013 in yarp::os::impl::Companion::disconnect (src=0x7ffe416283a0 "/out", dest=<optimised out>, silent=silent@entry=true) at /home/icub/robot/yarp/src/libYARP_OS/src/Companion.cpp:1851
#24 0x00007fae6b9f7eee in yarp::os::impl::PortCore::closeMain (this=this@entry=0x2347ff0) at /home/icub/robot/yarp/src/libYARP_OS/src/PortCore.cpp:400
#25 0x00007fae6b9f8839 in yarp::os::impl::PortCore::close (this=0x2347ff0) at /home/icub/robot/yarp/src/libYARP_OS/src/PortCore.cpp:251
#26 0x00007fae6ba11f63 in yarp::os::Port::close (this=this@entry=0x7ffe41628620) at /home/icub/robot/yarp/src/libYARP_OS/src/Port.cpp:345
#27 0x000000000048e0e8 in PortTest::testInterruptReply (this=this@entry=0x71dea0 <thePortTest>) at /home/icub/robot/yarp/tests/libYARP_OS/PortTest.cpp:1252
#28 0x0000000000490c08 in PortTest::runTests (this=0x71dea0 <thePortTest>) at /home/icub/robot/yarp/tests/libYARP_OS/PortTest.cpp:1453
#29 0x00007fae6ba72ca0 in yarp::os::impl::UnitTest::run (this=0x71dea0 <thePortTest>, argc=1, argv=0x7ffe41628eb8) at /home/icub/robot/yarp/src/libYARP_OS/src/UnitTest.cpp:149
#30 0x00007fae6ba72354 in yarp::os::impl::UnitTest::runSubTests (this=0x21a1520, argc=1, argv=0x7ffe41628eb8) at /home/icub/robot/yarp/src/libYARP_OS/src/UnitTest.cpp:123
#31 0x00007fae6ba72c51 in yarp::os::impl::UnitTest::run (this=0x21a1520, argc=1, argv=0x7ffe41628eb8) at /home/icub/robot/yarp/src/libYARP_OS/src/UnitTest.cpp:153
#32 0x00000000004391bf in main (argc=3, argv=0x7ffe41628ea8) at /home/icub/robot/yarp/tests/libYARP_OS/harness.cpp:65

Thread 1 (Thread 0x7fae69405700 (LWP 7801)):
#0  0x00007fae6ba0f751 in yarp::os::impl::PortCoreOutputUnit::sendHelper (this=this@entry=0x7fae60004370) at /home/icub/robot/yarp/src/libYARP_OS/src/PortCoreOutputUnit.cpp:333
#1  0x00007fae6ba110e4 in yarp::os::impl::PortCoreOutputUnit::send (this=0x7fae60004370, writer=..., reader=0x7fae69404e40, callback=0x7fae69404df8, tracker=0x7fae60003e60, envelopeString=..., 
    waitAfter=true, waitBefore=true, gotReply=0x7fae69404ce0) at /home/icub/robot/yarp/src/libYARP_OS/src/PortCoreOutputUnit.cpp:377
#2  0x00007fae6b9fae74 in yarp::os::impl::PortCore::sendHelper (this=this@entry=0x22c7f00, writer=..., mode=mode@entry=1, reader=reader@entry=0x7fae69404e40, callback=callback@entry=0x0)
    at /home/icub/robot/yarp/src/libYARP_OS/src/PortCore.cpp:1340
#3  0x00007fae6b9fb30f in yarp::os::impl::PortCore::send (this=this@entry=0x22c7f00, writer=..., reader=reader@entry=0x7fae69404e40, callback=callback@entry=0x0)
    at /home/icub/robot/yarp/src/libYARP_OS/src/PortCore.cpp:1262
#4  0x00007fae6ba11944 in yarp::os::Port::write (this=this@entry=0x7ffe41628658, writer=..., reader=..., callback=callback@entry=0x0) at /home/icub/robot/yarp/src/libYARP_OS/src/Port.cpp:439
#5  0x000000000048a0ee in ServiceUser::run (this=0x7ffe41628640) at /home/icub/robot/yarp/tests/libYARP_OS/PortTest.cpp:200
#6  0x00007fae6ba707b5 in theExecutiveBranch (args=0x23465a0) at /home/icub/robot/yarp/src/libYARP_OS/src/ThreadImpl.cpp:96
#7  0x00007fae6c903f56 in ACE_OS_Thread_Adapter::invoke() () from /usr/lib/libACE-6.3.3.so
#8  0x00007fae6c5f06fa in start_thread (arg=0x7fae69405700) at pthread_create.c:333
#9  0x00007fae6b092b5d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:109
drdanz commented 7 years ago

@Tobias-Fischer Thanks for the dump, that's quite useful! I'll have a look at it as soon as I have some time. How did you reproduce the bug? I haven't been able to reproduce it locally...

Tobias-Fischer commented 7 years ago

Hi, I used the following snippet: while /home/icub/robot/yarp/build/bin/harness_os "regression" "PortTest"; do :; done. It might take a while though before it crashes.

Tobias-Fischer commented 7 years ago

Hi, I think it's much easier than that. The protection if (op!=NULL) does not go all the way to the line where the segfault happens (if(replied && op->getSender().modifiesReply())), thus I think in the segfault case op==NULL and thus op->getSender() leads to a segfault. Thus, moving the closing } from line 325 down just before the return statement might fix the problem.

I'm running the test now, if it doesn't fail by the end of the day, I'll create a PR.

drdanz commented 7 years ago

@Tobias-Fischer Awesome! I'm looking forward for your patch :+1:

Tobias-Fischer commented 7 years ago

This still happens :( https://travis-ci.org/robotology/yarp/jobs/152354037