Lock-free buffer implementation

mathias-luedtke commented 9 years ago

The current buffer implementation is not lock-free and does not support writes from the real-time part. I have written a proof-of-concept implementation that might be a viable replacement. It enforces a last-write-wins strategy, since this is what we need for the controlles, right?

You can find it here: https://gist.github.com/ipa-mdl/773f043a31b3bcce3598 It is based on boost::atomic and boost::lockfree and requires a recent boost installation (14.04/indigo works). rosrt backported both to hydro, but is not maintained anymore.

It is divided into two parts: LockfreeBuffer and LockedBuffer.

LockfreeBuffer maintains a pre-allocated lock-free freelist and provides the most recent data (push) to one single reader (pop).

all calls should be lock-free, thread-safe and realtime-safe (if the payload copy constructor is!)
Concurrent reads will lead to an empty pointer for one reader.
The data is provided with RAII pointers to prevent misuses.
The freelist must be big enough (number of writers + 1 reader) to work without starvation.
Data can be pushed and pop from real-time and non-real-time parts.

However, the LockfreeBuffer only works for single consumers (sufficient for all current ros_controllers). Therefore I have provided a LockedBuffer implementation, that runs on top of the LockfreeBuffer.

blocking read (~ ReatimeBuffer::readFromNonRT)
non-blocking try_read (~ ReatimeBuffer::readFromRT).
write is just an alias for LockfreeBuffer::push

It is not ready for release yet, but I would really appreciate your comments on this.

adolfo-rt commented 9 years ago

Allow me a day or two to read the code in detail. From the summary, I'm concerned about the multiple reader case, if reading is subject to blocking, then it largely defeats the purpose of the data structure, doesn't it?. At any rate, I haven't yet dug into the code.

I would suggest looking into at least two high-quality implementations of lock-free data structures I know of:

Orocos RTT

They implement multi-reader, multi-writer synchronization primitives with different locking and buffering policies. Take a look at these classes, which implement the core data structures, in particular RTT::base::DataObjectLockFree.

This is an optional read, but it's interesting how they configure the data flow at a higher level (buffering and locking policies): RTT::ConnPolicy

liblfds

liblfds.org A portable, license-free, lock-free data structure library written in C. Its scope is much more focused than RTT, which does a lot more, so I'd rather point you to the library docs for details.

One important point I want to make here is that there are well-tested and benchmarked implementations of lock-free data structures out there. Since it's not trivial to get the implementation right (in correctness and performance), I wanted to see if we could simply wrap an existing implementation around a simple interface and let a third part do the heavy lifting.

Finally, and for information purposes, the 1024cores has some very interesting posts about lock-free algorithms.

mathias-luedtke commented 9 years ago

For the multiple readers case, there is try_read, works with a try lock, the same way as the RealtimeBuffer. Currently no controller really needs muiltiple readers. The trajectory controller reads from both sides, but I think that it could be decoupled.

I did not implement the lockfree algorithm myself, I just use boost for that, assuming that it is portable and tested. My implementation focuses more on the pre-allocation and the last-write-wins strategy. The latter is the tricky part for a lock-free implementation with multiple readers.

I will have a look at the links if I have more time again :)

efernandez commented 8 years ago

What's the current state on this?

mathias-luedtke commented 8 years ago

I had a brief look at the Orocos implementation. It just supports multiple readers, but only one producer/writer. We need a multiple writers solution.

jacquelinekay commented 8 years ago

some outsider opinions, forgive my general ignorance of realtime_tools and Orocos:

the libflds implementation which Adolfo linked implements a ring buffer interface with a freelist and a queue and supports multiple readers and multiple writers--but readers are restricted to a single read point and writers are restricted to a single write point:

http://liblfds.org/mediawiki/index.php?title=r6:API:Ringbuffer

is this sufficient or are you looking for a lock-free solution that allows multiple concurrent readers at different read points? I actually think this implementation is pretty close to what @ipa-mdl was trying to achieve in the original post; a lockfree queue and freelist with a ring buffer interface.

the easiest thing to do is definitely to pick an existing proven open source implementation and wrap it with the same interface as the existing RealtimeBuffer, especially if you want to get it into Kinetic :) of course the disadvantage of wrapping an existing implementation is losing flexibility. Just checking @ipa-mdl, when you refer to the "last write wins" strategy are you referring to using acquire_release memory order to ensure that atomic writes are not reordered between threads? Does boost::lockfree::queue not guarantee memory order?

boost::lockfree::queue claims to be a multi-reader/multi-writer queue. It uses an internal atomic freelist to manage memory if specified as fixed-size. You can also use the boost::lockfree::allocator interface to specify an allocation strategy for the freelist although this could be more pain than it's worth. Also, if the queue template type is instantiated with a non-POD type that cannot be compared/exchanged atomically, the queue defaults to thread-safe but locking concurrent behavior. It seems to me that you should be able to use the Boost implementation to fulfil the multi-reader multi-writer requirement--I guess the metadata you introduced to manage the freelist needs a synchronization mechanism in push and that's why the implementation isn't safe for multiple writers?

(given recent C++11 support for atomics I'd like to see a set of lock-free data structures in the standard library so that us average programmers can get along with our lives. but alas, we'll probably have to wait for C++17 for that -_-)

I still need a MPMC lock-free queue for ROS 2 multithreaded intra-process messaging and I haven't come to a conclusion yet on what implementation to use, so we should definitely share knowledge on this effort :)

ros-controls / realtime_tools

Lock-free buffer implementation #14

Orocos RTT

liblfds