orocos-toolchain / rtt

Orocos Real-Time Toolkit
http://www.orocos.org
Other
72 stars 79 forks source link

Massive deadlocking when disconnecting connections in parallel #300

Open doudou opened 5 years ago

doudou commented 5 years ago

The refactored channel implementation that just landed in master has a very liberal use of locking, which leads to deadlocks when doing parallel disconnections. I'm chasing those, and will propose a PR for review, I'm opening this issue to track the problem.

Generally speaking, one major issue I see (apart from the deadlocking) is that remote disconnection calls are done under lock. My current strategy is to split the "remove channel from channel list(s)" from the "destroy the channel".

meyerj commented 5 years ago

Thanks for reporting. Could you already share an example or unit test to reproduce the problem?

There is a set of new test cases in ports_test.cpp, introduced in https://github.com/orocos-toolchain/rtt/commit/f1404ff714d3bee75b1c560fa3132cc9108b139c, that should have covered issues with parallel port connection, disconnection, reads and writes. But indeed it only spawns a single thread to add and remove an input port from the connection and another for an output port, but never two input or output ports concurrently.

What you suggest sounds a bit like (partially) reverting https://github.com/orocos-toolchain/rtt/pull/283, a patch that was only added recently. Without, the test cases in ports_test.cpp mentioned above did not check whether the port connections were actually successful, which was often not the case.

doudou commented 5 years ago

See #302