robotology / yarp

YARP - Yet Another Robot Platform
http://www.yarp.it
Other
529 stars 195 forks source link

Fix use of shmem and unix_stream carriers in remote_controlboard and multipleanalogsensorclient network wrapper clients #3147

Closed traversaro closed 2 weeks ago

traversaro commented 1 month ago

In the context of walking, inside the ergocub-torso we are interested in establishing connection that continue to run even if the network interface over which they are established temporary goes down. The rationale for this is that at the moment in most demos we run the yarpserver in the operator laptop and the walking and the robot robotinterface inside the ergocub-torso board embedded in the robot, and while walking the robot could loose the connection with the operator laptop where the yarpserver is running.

To do so, we experimented with using shmem and unix_stream carriers, experiencing the following problems:

This PR fixes all these problems, in particular:

~Furthermore, in this PR we also use the GENERATE functionality of catch to ensure that the multipleanalogsensorsclient and remotecontrolboardremapper tests are running on all the following carriers:tcp,fast_tcp,shmem,unix_stream~

The testing part was dropped as it was also exposed some existing issues that are hard to debug and for which the fix had to touch the internal behaviour of YARP's Port, with an high chance of regressions. For reference, those attempts are documented in the comment in this PRs (that are not hidden as outdated) and the related code is https://github.com/traversaro/yarp/tree/archiveattemptofixvalgrind .

update-docs[bot] commented 1 month ago

Thanks for opening this pull request! The maintainers of this repository would appreciate it if you would update the release notes by adding a file in doc/release/<target_branch>, based on your changes.

traversaro commented 1 month ago

fyi @S-Dafarra @GiulioRomualdi

traversaro commented 1 month ago

There are some unexpected failures that are not present in other PRs (such as https://github.com/robotology/yarp/pull/3146), in particular in macos-13 and Linux + Valgrind. I will reproduce those locally and report this.

traversaro commented 1 month ago

The macos failures are quite interesting:

2024-10-25T11:35:51.3988180Z [INFO] |yarp.os.impl.PortCoreInputUnit|/testRemapperRobot/a/rpc:i| Receiving input from /test/remoteControlBoardRemapper/testRemapperRobot/a/rpc:o to /testRemapperRobot/a/rpc:i using tcp
2024-10-25T11:35:51.3989860Z [INFO] |yarp.os.impl.PortCoreOutputUnit|/test/remoteControlBoardRemapper/testRemapperRobot/a/rpc:o| Sending output from /test/remoteControlBoardRemapper/testRemapperRobot/a/rpc:o to /testRemapperRobot/a/rpc:i using tcp
2024-10-25T11:35:51.3990030Z ACE_Mutex::ACE_Mutex: File name too long
2024-10-25T11:35:51.3990210Z ACE_Mutex::ACE_Mutex: File name too long
2024-10-25T11:35:51.3990380Z ACE_Mutex::ACE_Mutex: File name too long
2024-10-25T11:35:51.3990540Z ACE_Mutex::ACE_Mutex: File name too long
2024-10-25T11:35:51.3990720Z ACE_Mutex::ACE_Mutex: File name too long
2024-10-25T11:35:51.3990880Z ACE_Mutex::ACE_Mutex: File name too long
2024-10-25T11:35:51.3992370Z [INFO] |yarp.os.impl.PortCoreInputUnit|/testRemapperRobot/a/command:i| Receiving input from /test/remoteControlBoardRemapper/testRemapperRobot/a/command:o to /testRemapperRobot/a/command:i using shmem
2024-10-25T11:35:51.3992830Z ACE_Mutex::ACE_Mutex: File name too long
2024-10-25T11:35:51.3993010Z ACE_Mutex::ACE_Mutex: File name too long
2024-10-25T11:35:51.3994810Z [INFO] |yarp.os.impl.PortCoreOutputUnit|/test/remoteControlBoardRemapper/testRemapperRobot/a/command:o| Sending output from /test/remoteControlBoardRemapper/testRemapperRobot/a/command:o to /testRemapperRobot/a/command:i using shmem
2024-10-25T11:35:51.3994980Z ACE_Mutex::ACE_Mutex: File name too long
2024-10-25T11:35:51.3995150Z ACE_Mutex::ACE_Mutex: File name too long
2024-10-25T11:35:51.3995320Z ACE_Mutex::ACE_Mutex: File name too long
2024-10-25T11:35:51.3995490Z ACE_Mutex::ACE_Mutex: File name too long
2024-10-25T11:35:51.3995660Z ACE_Mutex::ACE_Mutex: File name too long
2024-10-25T11:35:51.3995830Z ACE_Mutex::ACE_Mutex: File name too long
2024-10-25T11:35:51.3997630Z [INFO] |yarp.os.impl.PortCoreInputUnit|/test/remoteControlBoardRemapper/testRemapperRobot/a/stateExt:i| Receiving input from /testRemapperRobot/a/stateExt:o to /test/remoteControlBoardRemapper/testRemapperRobot/a/stateExt:i using shmem
2024-10-25T11:35:51.3997810Z ACE_Mutex::ACE_Mutex: File name too long
2024-10-25T11:35:51.3997970Z ACE_Mutex::ACE_Mutex: File name too long
2024-10-25T11:35:51.3999470Z [INFO] |yarp.os.impl.PortCoreOutputUnit|/testRemapperRobot/a/stateExt:o| Sending output from /testRemapperRobot/a/stateExt:o to /test/remoteControlBoardRemapper/testRemapperRobot/a/stateExt:i using shmem
2024-10-25T11:35:51.4000360Z [INFO] |yarp.dev.PolyDriver|remote_controlboard| Created device <remote_controlboard>. See C++ class RemoteControlBoard for documentation.
2024-10-25T11:35:51.4001940Z [DEBUG] |yarp.dev.PolyDriver|remote_controlboard| Parameters are (carrier shmem) (device remote_controlboard) (local "/test/remoteControlBoardRemapper/testRemapperRobot/b") (remote "/testRemapperRobot/b") (writeStrict on)
2024-10-25T11:35:51.4002580Z [INFO] |yarp.device.RemoteControlBoard| Parameter 'remote' using value: /testRemapperRobot/b
2024-10-25T11:35:51.4003480Z [INFO] |yarp.device.RemoteControlBoard| Parameter 'local' using value: /test/remoteControlBoardRemapper/testRemapperRobot/b
2024-10-25T11:35:51.4003990Z [INFO] |yarp.device.RemoteControlBoard| Parameter 'writeStrict' using value: on
2024-10-25T11:35:51.4004500Z [INFO] |yarp.device.RemoteControlBoard| Parameter 'carrier' using value: shmem
2024-10-25T11:35:51.4005040Z [INFO] |yarp.device.RemoteControlBoard| Parameter 'timeout' using DEFAULT value: 0.5
2024-10-25T11:35:51.4005730Z [INFO] |yarp.device.RemoteControlBoard| Parameter 'local_qos::enable' using DEFAULT value: false
2024-10-25T11:35:51.4006440Z [INFO] |yarp.device.RemoteControlBoard| Parameter 'local_qos::thread_priority' using DEFAULT value: 0
2024-10-25T11:35:51.4007130Z [INFO] |yarp.device.RemoteControlBoard| Parameter 'local_qos::thread_policy' using DEFAULT value: 0
2024-10-25T11:35:51.4007840Z [INFO] |yarp.device.RemoteControlBoard| Parameter 'local_qos::packet_priority' using DEFAULT value: 
2024-10-25T11:35:51.4008520Z [INFO] |yarp.device.RemoteControlBoard| Parameter 'remote_qos::enable' using DEFAULT value: false
2024-10-25T11:35:51.4009240Z [INFO] |yarp.device.RemoteControlBoard| Parameter 'remote_qos::thread_priority' using DEFAULT value: 0
2024-10-25T11:35:51.4009950Z [INFO] |yarp.device.RemoteControlBoard| Parameter 'remote_qos::thread_policy' using DEFAULT value: 0
2024-10-25T11:35:51.4010670Z [INFO] |yarp.device.RemoteControlBoard| Parameter 'remote_qos::packet_priority' using DEFAULT value: 
2024-10-25T11:35:51.4011340Z [INFO] |yarp.device.RemoteControlBoard| Parameter 'ignoreProtocolCheck' using DEFAULT value: false
2024-10-25T11:35:51.4011920Z [INFO] |yarp.device.RemoteControlBoard| Parameter 'diagnostic' using DEFAULT value: false
2024-10-25T11:35:51.4012610Z [INFO] |yarp.device.remote_controlboard| RemoteControlBoard is ENABLING the writeStrict option for all commands
2024-10-25T11:35:51.4014040Z [INFO] |yarp.os.Port|/test/remoteControlBoardRemapper/testRemapperRobot/b/rpc:o| Port /test/remoteControlBoardRemapper/testRemapperRobot/b/rpc:o active at tcp://localhost:20017/
2024-10-25T11:35:51.4015590Z [INFO] |yarp.os.Port|/test/remoteControlBoardRemapper/testRemapperRobot/b/command:o| Port /test/remoteControlBoardRemapper/testRemapperRobot/b/command:o active at tcp://localhost:20018/
2024-10-25T11:35:51.4017000Z [INFO] |yarp.os.Port|/test/remoteControlBoardRemapper/testRemapperRobot/b/stateExt:i| Port /test/remoteControlBoardRemapper/testRemapperRobot/b/stateExt:i active at tcp://localhost:20019/
2024-10-25T11:35:51.4018670Z [INFO] |yarp.os.impl.PortCoreOutputUnit|/test/remoteControlBoardRemapper/testRemapperRobot/b/rpc:o| Sending output from /test/remoteControlBoardRemapper/testRemapperRobot/b/rpc:o to /testRemapperRobot/b/rpc:i using tcp
2024-10-25T11:35:51.4020030Z [INFO] |yarp.os.impl.PortCoreInputUnit|/testRemapperRobot/b/rpc:i| Receiving input from /test/remoteControlBoardRemapper/testRemapperRobot/b/rpc:o to /testRemapperRobot/b/rpc:i using tcp
2024-10-25T11:35:51.4020350Z ACE_SV_Shared_Memory::ACE_SV_Shared_Memory: Too many open files
2024-10-25T11:35:51.4020530Z ACE_Mutex::ACE_Mutex: File name too long
2024-10-25T11:35:51.4020700Z ACE_Mutex::ACE_Mutex: File name too long
2024-10-25T11:35:51.4021270Z ../../../../../src/devices/controlBoardRemapper/tests/controlBoardRemapper_t1_test.cpp:157: FAILED:
2024-10-25T11:35:51.4021470Z   {Unknown expression after the reported line}
2024-10-25T11:35:51.4021630Z due to a fatal error condition:
2024-10-25T11:35:51.4021860Z   SIGSEGV - Segmentation violation signal
traversaro commented 1 month ago

There are some unexpected failures that are not present in other PRs (such as #3146), in particular in macos-13 and Linux + Valgrind. I will reproduce those locally and report this.

Actually Linux+Valgrind already fail in other PRs? Instead macos-13 is instead a new failure.

traversaro commented 1 month ago

There are some unexpected failures that are not present in other PRs (such as #3146), in particular in macos-13 and Linux + Valgrind. I will reproduce those locally and report this.

Actually Linux+Valgrind already fail in other PRs? Instead macos-13 is instead a new failure.

No, 22 (Valgrind+NO ACE) was failing in other PRs, this PR fixes 22 and breaks 21 (Valgrind+YES ACE).

traversaro commented 1 month ago

Valgrind of mas test failures:

[DEBUG] |yarp.dev.PolyDriver|multipleanalogsensorsclient| Parameters are (carrier shmem) (device multipleanalogsensorsclient) (local "/test/mas/client") (remote "/test/mas/server") (timeout 1.0)
[INFO] |yarp.device.MultipleAnalogSensorsClient| Parameter 'remote' using value: /test/mas/server
[INFO] |yarp.device.MultipleAnalogSensorsClient| Parameter 'local' using value: /test/mas/client
[INFO] |yarp.device.MultipleAnalogSensorsClient| Parameter 'timeout' using value: 1
[INFO] |yarp.device.MultipleAnalogSensorsClient| Parameter 'externalConnection' using DEFAULT value: false
[INFO] |yarp.device.MultipleAnalogSensorsClient| Parameter 'carrier' using value: shmem
[INFO] |yarp.os.Port|/test/mas/client/rpc:i| Port /test/mas/client/rpc:i active at tcp://localhost:10004/
[INFO] |yarp.os.Port|/test/mas/client/measures:i| Port /test/mas/client/measures:i active at tcp://localhost:10005/
[INFO] |yarp.os.impl.PortCoreOutputUnit|/test/mas/client/rpc:i| Sending output from /test/mas/client/rpc:i to /test/mas/server/rpc:o using tcp
[INFO] |yarp.os.impl.PortCoreInputUnit|/test/mas/server/rpc:o| Receiving input from /test/mas/client/rpc:i to /test/mas/server/rpc:o using tcp
==109003== Thread 11:
==109003== Syscall param write(buf) points to uninitialised byte(s)
==109003==    at 0x50695AD: __libc_write (write.c:26)
==109003==    by 0x50695AD: write (write.c:24)
==109003==    by 0x526F3AD: write (OS_NS_unistd.inl:1158)
==109003==    by 0x526F3AD: send_i (ACE.inl:222)
==109003==    by 0x526F3AD: ACE::send_n_i(int, void const*, unsigned long, unsigned long*) (ACE.cpp:1586)
==109003==    by 0x88BD9B0: send_n (ACE.inl:184)
==109003==    by 0x88BD9B0: send_n (SOCK_Stream.inl:115)
==109003==    by 0x88BD9B0: ShmemHybridStream::accept() (ShmemHybridStream.cpp:91)
==109003==    by 0x88BCC22: ShmemCarrier::becomeShmemVersionHybridStream(yarp::os::ConnectionState&, bool) (ShmemCarrier.cpp:79)
==109003==    by 0x4BCC1A7: yarp::os::impl::Protocol::respondToHeader() (Protocol.cpp:510)
==109003==    by 0x4BCC838: yarp::os::impl::Protocol::open(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) (Protocol.cpp:220)
==109003==    by 0x4BC21CB: yarp::os::impl::PortCoreInputUnit::run() (PortCoreInputUnit.cpp:99)
==109003==    by 0x4BDBFE4: theExecutiveBranch(void*) (ThreadImpl.cpp:81)
==109003==    by 0x4D10B64: execute_native_thread_routine (thread.cc:104)
==109003==    by 0x4FE9A93: start_thread (pthread_create.c:447)
==109003==    by 0x5076A33: clone (clone.S:100)
==109003==  Address 0xc10781c is on thread 11's stack
==109003==  in frame #2, created by ShmemHybridStream::accept() (ACE.inl:50)
==109003==
[INFO] |yarp.os.impl.PortCoreInputUnit|/test/mas/client/measures:i| Receiving input from /test/mas/server/measures:o to /test/mas/client/measures:i using shmem
[INFO] |yarp.os.impl.PortCoreOutputUnit|/test/mas/server/measures:o| Sending output from /test/mas/server/measures:o to /test/mas/client/measures:i using shmem
[DEBUG] |yarp.device.multipleanalogsensorsclient| Open complete
[INFO] |yarp.dev.PolyDriver|multipleanalogsensorsclient| Created device <multipleanalogsensorsclient>. See C++ class MultipleAnalogSensorsClient for documentation.
[DEBUG] |yarp.carrier.shmem| INTERRUPT
[INFO] |yarp.os.impl.PortCoreInputUnit|/test/mas/client/measures:i| Removing input from /test/mas/server/measures:o to /test/mas/client/measures:i
[INFO] |yarp.os.impl.PortCoreOutputUnit|/test/mas/server/measures:o| Removing output from /test/mas/server/measures:o to /test/mas/client/measures:i
[INFO] |yarp.os.impl.PortCoreInputUnit|/test/mas/server/rpc:o| Removing input from /test/mas/client/rpc:i to /test/mas/server/rpc:o
[INFO] |yarp.os.impl.PortCoreOutputUnit|/test/mas/client/rpc:i| Removing output from /test/mas/client/rpc:i to /test/mas/server/rpc:o
[DEBUG] |yarp.device.multipleanalogsensorsclient| Close complete
[DEBUG] |yarp.device.multipleanalogsensorsserver| Detach complete
[DEBUG] |yarp.device.multipleanalogsensorsserver| Detach complete
[DEBUG] |yarp.dev.PolyDriver|fakeIMU| Parameters are (constantValue 1) (device fakeIMU)
[INFO] |yarp.device.FakeIMU| Parameter 'period' using DEFAULT value: 10
[INFO] |yarp.device.FakeIMU| Parameter 'constantValue' using value: true
[INFO] |yarp.device.FakeIMU| Parameter 'sensorName' using DEFAULT value: sensorName
[INFO] |yarp.device.FakeIMU| Parameter 'frameName' using DEFAULT value: frameName
[INFO] |yarp.dev.PolyDriver|fakeIMU| Created device <fakeIMU>. See C++ class FakeIMU for documentation.
[DEBUG] |yarp.dev.PolyDriver|multipleanalogsensorsserver| Parameters are (device multipleanalogsensorsserver) (name "/test/mas/server") (period 10)
[INFO] |yarp.device.MultipleAnalogSensorsServer| Parameter 'name' using value: /test/mas/server
[INFO] |yarp.device.MultipleAnalogSensorsServer| Parameter 'period' using value: 10
[INFO] |yarp.dev.PolyDriver|multipleanalogsensorsserver| Created wrapper <multipleanalogsensorsserver>. See C++ class MultipleAnalogSensorsServer for documentation.
[INFO] |yarp.os.Port|/test/mas/server/measures:o| Port /test/mas/server/measures:o active at tcp://localhost:10002/
[INFO] |yarp.os.Port|/test/mas/server/rpc:o| Port /test/mas/server/rpc:o active at tcp://localhost:10003/
[DEBUG] |yarp.device.multipleanalogsensorsserver| Attach complete
traversaro commented 1 month ago

Remapper failure:

===============================================================================
All tests passed (688 assertions in 1 test case)

==113709==
==113709== HEAP SUMMARY:
==113709==     in use at exit: 0 bytes in 0 blocks
==113709==   total heap usage: 5,167,984 allocs, 5,167,984 frees, 562,897,147 bytes allocated
==113709==
==113709== All heap blocks were freed -- no leaks are possible
==113709==
==113709== ERROR SUMMARY: 7 errors from 3 contexts (suppressed: 0 from 0)
==113709==
==113709== 1 errors in context 1 of 3:
==113709== Invalid read of size 4
==113709==    at 0x4B74CDC: get_handle (IPC_SAP.inl:16)
==113709==    by 0x4B74CDC: close_writer (SOCK_Stream.inl:42)
==113709==    by 0x4B74CDC: yarp::os::impl::SocketTwoWayStream::interrupt() (SocketTwoWayStream.h:85)
==113709==    by 0x4B720F2: interrupt (Protocol.cpp:258)
==113709==    by 0x4B720F2: yarp::os::impl::Protocol::interrupt() (Protocol.cpp:248)
==113709==    by 0x4B6E0FE: yarp::os::impl::PortCoreOutputUnit::closeMain() (PortCoreOutputUnit.cpp:223)
==113709==    by 0x4B576F2: yarp::os::impl::PortCore::reapUnits() (PortCore.cpp:576)
==113709==    by 0x4B60E93: yarp::os::impl::PortCore::run() (PortCore.cpp:246)
==113709==    by 0x4B81FE4: theExecutiveBranch(void*) (ThreadImpl.cpp:81)
==113709==    by 0x4CB6B64: execute_native_thread_routine (thread.cc:104)
==113709==    by 0x4F8FA93: start_thread (pthread_create.c:447)
==113709==    by 0x501CA33: clone (clone.S:100)
==113709==  Address 0x576c618 is 24 bytes inside a block of size 120 free'd
==113709==    at 0x484A61D: operator delete(void*, unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==113709==    by 0x4AFBA8F: yarp::os::ShiftStream::Private::close() (ShiftStream.cpp:37)
==113709==    by 0x4B728CB: yarp::os::impl::Protocol::closeHelper() (Protocol.cpp:535)
==113709==    by 0x4B6D248: yarp::os::impl::PortCoreOutputUnit::closeBasic() (PortCoreOutputUnit.cpp:205)
==113709==    by 0x4B6E452: yarp::os::impl::PortCoreOutputUnit::sendHelper() (PortCoreOutputUnit.cpp:333)
==113709==    by 0x4B6EDDD: yarp::os::impl::PortCoreOutputUnit::run() (PortCoreOutputUnit.cpp:95)
==113709==    by 0x4B81FE4: theExecutiveBranch(void*) (ThreadImpl.cpp:81)
==113709==    by 0x4CB6B64: execute_native_thread_routine (thread.cc:104)
==113709==    by 0x4F8FA93: start_thread (pthread_create.c:447)
==113709==    by 0x501CA33: clone (clone.S:100)
==113709==  Block was alloc'd at
==113709==    at 0x4846FA3: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==113709==    by 0x4B800EA: yarp::os::impl::TcpFace::write(yarp::os::Contact const&) (TcpFace.cpp:122)
==113709==    by 0x4A83DB2: yarp::os::Carriers::connect(yarp::os::Contact const&) (Carriers.cpp:297)
==113709==    by 0x4B61F05: yarp::os::impl::PortCore::addOutput(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void*, yarp::os::OutputStream*, bool) (PortCore.cpp:952)
==113709==    by 0x4B6290A: operator() (PortCore.cpp:1878)
==113709==    by 0x4B6290A: yarp::os::impl::PortCore::adminBlock(yarp::os::ConnectionReader&, void*) (PortCore.cpp:2584)
==113709==    by 0x4B696F7: yarp::os::impl::PortCoreInputUnit::run() (PortCoreInputUnit.cpp:313)
==113709==    by 0x4B81FE4: theExecutiveBranch(void*) (ThreadImpl.cpp:81)
==113709==    by 0x4CB6B64: execute_native_thread_routine (thread.cc:104)
==113709==    by 0x4F8FA93: start_thread (pthread_create.c:447)
==113709==    by 0x501CA33: clone (clone.S:100)
==113709==
==113709==
==113709== 3 errors in context 2 of 3:
==113709== Thread 24:
==113709== Invalid write of size 1
==113709==    at 0x4B720F3: interrupt (Protocol.cpp:259)
==113709==    by 0x4B720F3: yarp::os::impl::Protocol::interrupt() (Protocol.cpp:248)
==113709==    by 0x4B6E0FE: yarp::os::impl::PortCoreOutputUnit::closeMain() (PortCoreOutputUnit.cpp:223)
==113709==    by 0x4B576F2: yarp::os::impl::PortCore::reapUnits() (PortCore.cpp:576)
==113709==    by 0x4B60E93: yarp::os::impl::PortCore::run() (PortCore.cpp:246)
==113709==    by 0x4B81FE4: theExecutiveBranch(void*) (ThreadImpl.cpp:81)
==113709==    by 0x4CB6B64: execute_native_thread_routine (thread.cc:104)
==113709==    by 0x4F8FA93: start_thread (pthread_create.c:447)
==113709==    by 0x501CA33: clone (clone.S:100)
==113709==  Address 0x6779150 is 48 bytes inside a block of size 352 free'd
==113709==    at 0x484A61D: operator delete(void*, unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==113709==    by 0x4B6D25A: yarp::os::impl::PortCoreOutputUnit::closeBasic() (PortCoreOutputUnit.cpp:206)
==113709==    by 0x4B6E452: yarp::os::impl::PortCoreOutputUnit::sendHelper() (PortCoreOutputUnit.cpp:333)
==113709==    by 0x4B6EDDD: yarp::os::impl::PortCoreOutputUnit::run() (PortCoreOutputUnit.cpp:95)
==113709==    by 0x4B81FE4: theExecutiveBranch(void*) (ThreadImpl.cpp:81)
==113709==    by 0x4CB6B64: execute_native_thread_routine (thread.cc:104)
==113709==    by 0x4F8FA93: start_thread (pthread_create.c:447)
==113709==    by 0x501CA33: clone (clone.S:100)
==113709==  Block was alloc'd at
==113709==    at 0x4846FA3: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==113709==    by 0x4B801D5: yarp::os::impl::TcpFace::write(yarp::os::Contact const&) (TcpFace.cpp:142)
==113709==    by 0x4A83DB2: yarp::os::Carriers::connect(yarp::os::Contact const&) (Carriers.cpp:297)
==113709==    by 0x4B61F05: yarp::os::impl::PortCore::addOutput(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void*, yarp::os::OutputStream*, bool) (PortCore.cpp:952)
==113709==    by 0x4B6290A: operator() (PortCore.cpp:1878)
==113709==    by 0x4B6290A: yarp::os::impl::PortCore::adminBlock(yarp::os::ConnectionReader&, void*) (PortCore.cpp:2584)
==113709==    by 0x4B696F7: yarp::os::impl::PortCoreInputUnit::run() (PortCoreInputUnit.cpp:313)
==113709==    by 0x4B81FE4: theExecutiveBranch(void*) (ThreadImpl.cpp:81)
==113709==    by 0x4CB6B64: execute_native_thread_routine (thread.cc:104)
==113709==    by 0x4F8FA93: start_thread (pthread_create.c:447)
==113709==    by 0x501CA33: clone (clone.S:100)
==113709==
==113709==
==113709== 3 errors in context 3 of 3:
==113709== Invalid read of size 4
==113709==    at 0x52DAB34: get_handle (IPC_SAP.inl:16)
==113709==    by 0x52DAB34: ACE_SOCK::close() (SOCK.cpp:75)
==113709==    by 0x4B74D3A: yarp::os::impl::SocketTwoWayStream::interrupt() (SocketTwoWayStream.h:87)
==113709==    by 0x4B720F2: interrupt (Protocol.cpp:258)
==113709==    by 0x4B720F2: yarp::os::impl::Protocol::interrupt() (Protocol.cpp:248)
==113709==    by 0x4B6E0FE: yarp::os::impl::PortCoreOutputUnit::closeMain() (PortCoreOutputUnit.cpp:223)
==113709==    by 0x4B576F2: yarp::os::impl::PortCore::reapUnits() (PortCore.cpp:576)
==113709==    by 0x4B60E93: yarp::os::impl::PortCore::run() (PortCore.cpp:246)
==113709==    by 0x4B81FE4: theExecutiveBranch(void*) (ThreadImpl.cpp:81)
==113709==    by 0x4CB6B64: execute_native_thread_routine (thread.cc:104)
==113709==    by 0x4F8FA93: start_thread (pthread_create.c:447)
==113709==    by 0x501CA33: clone (clone.S:100)
==113709==  Address 0x63fb1c8 is 24 bytes inside a block of size 120 free'd
==113709==    at 0x484A61D: operator delete(void*, unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==113709==    by 0x4AFBA8F: yarp::os::ShiftStream::Private::close() (ShiftStream.cpp:37)
==113709==    by 0x4B728CB: yarp::os::impl::Protocol::closeHelper() (Protocol.cpp:535)
==113709==    by 0x4B6D248: yarp::os::impl::PortCoreOutputUnit::closeBasic() (PortCoreOutputUnit.cpp:205)
==113709==    by 0x4B6E452: yarp::os::impl::PortCoreOutputUnit::sendHelper() (PortCoreOutputUnit.cpp:333)
==113709==    by 0x4B6EDDD: yarp::os::impl::PortCoreOutputUnit::run() (PortCoreOutputUnit.cpp:95)
==113709==    by 0x4B81FE4: theExecutiveBranch(void*) (ThreadImpl.cpp:81)
==113709==    by 0x4CB6B64: execute_native_thread_routine (thread.cc:104)
==113709==    by 0x4F8FA93: start_thread (pthread_create.c:447)
==113709==    by 0x501CA33: clone (clone.S:100)
==113709==  Block was alloc'd at
==113709==    at 0x4846FA3: operator new(unsigned long) (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==113709==    by 0x4B800EA: yarp::os::impl::TcpFace::write(yarp::os::Contact const&) (TcpFace.cpp:122)
==113709==    by 0x4A83DB2: yarp::os::Carriers::connect(yarp::os::Contact const&) (Carriers.cpp:297)
==113709==    by 0x4B61F05: yarp::os::impl::PortCore::addOutput(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, void*, yarp::os::OutputStream*, bool) (PortCore.cpp:952)
==113709==    by 0x4B6290A: operator() (PortCore.cpp:1878)
==113709==    by 0x4B6290A: yarp::os::impl::PortCore::adminBlock(yarp::os::ConnectionReader&, void*) (PortCore.cpp:2584)
==113709==    by 0x4B696F7: yarp::os::impl::PortCoreInputUnit::run() (PortCoreInputUnit.cpp:313)
==113709==    by 0x4B81FE4: theExecutiveBranch(void*) (ThreadImpl.cpp:81)
==113709==    by 0x4CB6B64: execute_native_thread_routine (thread.cc:104)
==113709==    by 0x4F8FA93: start_thread (pthread_create.c:447)
==113709==    by 0x501CA33: clone (clone.S:100)
==113709==
==113709== ERROR SUMMARY: 7 errors from 3 contexts (suppressed: 0 from 0)
traversaro commented 1 month ago

Interestingly, the valgrind failures occure also if one just tests with fast_tcp.

traversaro commented 1 month ago

I should have fixed the Valgrind problems. They were one existing use of not-initialized memory, and two race conditions when multiple threads were calling a close method. I will check the macOS problem next week when I have access to a macOS machine.

The valgrind fixes should also fix failures such as the one in job 22 in https://github.com/robotology/yarp/pull/3146 .

traversaro commented 1 month ago

For some reason this changes create a deadlock:

==411772==
==411772== Process terminating with default action of signal 2 (SIGINT)
==411772==    at 0x500EA9A: __libc_read (io/../sysdeps/unix/sysv/linux/read.c:26)
==411772==    by 0x500EA9A: read (io/../sysdeps/unix/sysv/linux/read.c:24)
==411772==    by 0x526F0BD: read (/home/conda/feedstock_root/build_artifacts/ace_1722622488690/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/unistd.h:44)
==411772==    by 0x526F0BD: read (/home/conda/feedstock_root/build_artifacts/ace_1722622488690/work/ace/../ace/OS_NS_unistd.inl:707)
==411772==    by 0x526F0BD: recv_i (/home/conda/feedstock_root/build_artifacts/ace_1722622488690/work/ace/../ace/ACE.inl:232)
==411772==    by 0x526F0BD: ACE::recv_n_i(int, void*, unsigned long, unsigned long*) (/home/conda/feedstock_root/build_artifacts/ace_1722622488690/work/ace/ACE.cpp:804)
==411772==    by 0x4B75437: non-virtual thunk to yarp::os::impl::SocketTwoWayStream::read(yarp::os::Bytes&) (src/libYARP_os/src/yarp/os/impl/SocketTwoWayStream.h:99)
==411772==    by 0x4A8DE91: read (src/libYARP_os/src/yarp/os/InputStream.cpp:24)
==411772==    by 0x4A8DE91: yarp::os::InputStream::readLine[abi:cxx11](char, bool*) (src/libYARP_os/src/yarp/os/InputStream.cpp:63)
==411772==    by 0x4B7EE73: yarp::os::impl::StreamConnectionReader::expectText[abi:cxx11](char) (src/libYARP_os/src/yarp/os/impl/StreamConnectionReader.cpp:266)
==411772==    by 0x4B165D2: yarp::os::impl::BottleImpl::read(yarp::os::ConnectionReader&) (src/libYARP_os/src/yarp/os/impl/BottleImpl.cpp:452)
==411772==    by 0x4B71EB4: yarp::os::impl::Protocol::write(yarp::os::SizedWriter&) (src/libYARP_os/src/yarp/os/impl/Protocol.cpp:311)
==411772==    by 0x4AA5E49: yarp::os::NetworkBase::write(yarp::os::Contact const&, yarp::os::PortWriter&, yarp::os::PortReader&, yarp::os::ContactStyle const&) (src/libYARP_os/src/yarp/os/Network.cpp:1316)
==411772==    by 0x4AA6529: enactConnection(yarp::os::Contact const&, yarp::os::Contact const&, yarp::os::ContactStyle const&, int, bool) (src/libYARP_os/src/yarp/os/Network.cpp:229)
==411772==    by 0x4AA7C8E: metaConnect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, yarp::os::ContactStyle, int) (src/libYARP_os/src/yarp/os/Network.cpp:641)
==411772==    by 0x4AA93A9: yarp::os::NetworkBase::disconnect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, yarp::os::ContactStyle const&) (src/libYARP_os/src/yarp/os/Network.cpp:713)
==411772==    by 0x4AA94D2: yarp::os::NetworkBase::disconnect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) (src/libYARP_os/src/yarp/os/Network.cpp:706)
==411772==
==411772== HEAP SUMMARY:
traversaro commented 1 month ago

For some reason this changes create a deadlock:

==411772==
==411772== Process terminating with default action of signal 2 (SIGINT)
==411772==    at 0x500EA9A: __libc_read (io/../sysdeps/unix/sysv/linux/read.c:26)
==411772==    by 0x500EA9A: read (io/../sysdeps/unix/sysv/linux/read.c:24)
==411772==    by 0x526F0BD: read (/home/conda/feedstock_root/build_artifacts/ace_1722622488690/_build_env/x86_64-conda-linux-gnu/sysroot/usr/include/bits/unistd.h:44)
==411772==    by 0x526F0BD: read (/home/conda/feedstock_root/build_artifacts/ace_1722622488690/work/ace/../ace/OS_NS_unistd.inl:707)
==411772==    by 0x526F0BD: recv_i (/home/conda/feedstock_root/build_artifacts/ace_1722622488690/work/ace/../ace/ACE.inl:232)
==411772==    by 0x526F0BD: ACE::recv_n_i(int, void*, unsigned long, unsigned long*) (/home/conda/feedstock_root/build_artifacts/ace_1722622488690/work/ace/ACE.cpp:804)
==411772==    by 0x4B75437: non-virtual thunk to yarp::os::impl::SocketTwoWayStream::read(yarp::os::Bytes&) (src/libYARP_os/src/yarp/os/impl/SocketTwoWayStream.h:99)
==411772==    by 0x4A8DE91: read (src/libYARP_os/src/yarp/os/InputStream.cpp:24)
==411772==    by 0x4A8DE91: yarp::os::InputStream::readLine[abi:cxx11](char, bool*) (src/libYARP_os/src/yarp/os/InputStream.cpp:63)
==411772==    by 0x4B7EE73: yarp::os::impl::StreamConnectionReader::expectText[abi:cxx11](char) (src/libYARP_os/src/yarp/os/impl/StreamConnectionReader.cpp:266)
==411772==    by 0x4B165D2: yarp::os::impl::BottleImpl::read(yarp::os::ConnectionReader&) (src/libYARP_os/src/yarp/os/impl/BottleImpl.cpp:452)
==411772==    by 0x4B71EB4: yarp::os::impl::Protocol::write(yarp::os::SizedWriter&) (src/libYARP_os/src/yarp/os/impl/Protocol.cpp:311)
==411772==    by 0x4AA5E49: yarp::os::NetworkBase::write(yarp::os::Contact const&, yarp::os::PortWriter&, yarp::os::PortReader&, yarp::os::ContactStyle const&) (src/libYARP_os/src/yarp/os/Network.cpp:1316)
==411772==    by 0x4AA6529: enactConnection(yarp::os::Contact const&, yarp::os::Contact const&, yarp::os::ContactStyle const&, int, bool) (src/libYARP_os/src/yarp/os/Network.cpp:229)
==411772==    by 0x4AA7C8E: metaConnect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, yarp::os::ContactStyle, int) (src/libYARP_os/src/yarp/os/Network.cpp:641)
==411772==    by 0x4AA93A9: yarp::os::NetworkBase::disconnect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, yarp::os::ContactStyle const&) (src/libYARP_os/src/yarp/os/Network.cpp:713)
==411772==    by 0x4AA94D2: yarp::os::NetworkBase::disconnect(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) (src/libYARP_os/src/yarp/os/Network.cpp:706)
==411772==
==411772== HEAP SUMMARY:

Fixing this problem is out of scope from the original goal of this PR, so for now I just removed the tests from the ControlBoardRemapper, and leave them in the remotecontrolboard and multipleanalogsenosrsclient tests. For future reference, the code that was part of this PR is in https://github.com/traversaro/yarp/tree/archiveattemptofixvalgrind .

sonarcloud[bot] commented 1 month ago

Quality Gate Failed Quality Gate failed

Failed conditions
D Maintainability Rating on New Code (required ≥ A)

See analysis details on SonarCloud

Catch issues before they fail your Quality Gate with our IDE extension SonarLint