Parallelize CAN writes - Githubissues

PeterBowman commented 5 years ago

Split from https://github.com/roboticslab-uc3m/yarp-devices/issues/209#issuecomment-518217324:

I would like to introduce threaded writes, though. Assuming that CAN TX comms are entirely managed by CanBusControlboard, this device would start a periodic thread to process incoming write requests. Messages would be registered by raw subdevices into a FIFO queue and dispatched to the network in periodic batches (thus taking advantage of the TX buffer), say every 1-5 milliseconds. The point of this is to parallelize multi-joint queries, so that one or more requests from one joint do not block requests of all other joints (also, this enables message buffering, i.e. we'll send more than one message at once). Incidentally, it would help synchronize (more or less) PDOs without a SYNC signal. Considerations:

Even though reads/writes are thread-safe, make sure this new thread does not collide with high-priority RPDOs (e.g. PT/PVT).

Raw subdevices will request a free slot from the message buffer, similarly to how prepare() returns YARP bottles in a buffered port context.

Most importantly, all of this is meaningless unless multi-joint queries can be actually processed in a parallel manner, e.g. via threaded for-loops. Reminds me of GrPPI; perhaps vanilla C++ (11?) renders sufficient for our needs, though.

PeterBowman commented 4 years ago

Parallelization could be easily achieved with <future> :

https://github.com/roboticslab-uc3m/yarp-devices/blob/05c202c14944b3ac592125b9bc4a170ac87e8d35/tests/testCanBusSharerLib.cpp#L188

I'd fancy an abstract DeviceMapper class and SequentialDeviceMapper + ConcurrentDeviceMapper subclasses, such that the latter manages a pool of threads given an arbitrary number of callbacks (one per joint).

PeterBowman commented 4 years ago

C++17 introduces execution policies which allow to easily switch between sequential and parallel loops via std::for_each algorithm: SO answer.

In C++11, I have to resort to playing with std::launch policies in std::async.

PeterBowman commented 4 years ago

The DeviceMapper class was added in https://github.com/roboticslab-uc3m/yarp-devices/commit/de805265a89fb86e4b5defe7727cbb5849645336. I plan on moving it to its own library (static/shared), write unit tests and enable parallelization (currently WIP at https://github.com/roboticslab-uc3m/yarp-devices/commit/7a7bf3fcc559e18a26062042c881dfa38798eec9).

Not sure about enabling thread limits, i.e. fix the max number of parallel tasks this class may start so that additional tasks are enqueued for later.

SO Q/A regarding static/shared library performance:

Q: (...) once the initialization and all has happened, does the function calling and execution take longer in case of dynamic libraries than static libraries?
A: It might. However, even if it does, the performance difference is really negligible.

Regarding thread limits: SO, reddit.

Two nice blog entries about C++11 move semantics: ref1, ref2.

PeterBowman commented 4 years ago

Quite interesting links from https://github.com/roboticslab-uc3m/yarp-devices/commit/8dcd9ec2498d4c58457cfff91205755d110aff8c regarding template specialization/instantiation:

PeterBowman commented 4 years ago

Ready at https://github.com/roboticslab-uc3m/yarp-devices/commit/3c6efbbfd96a8fcc6f603fdf1a3a7c3a86ffe5ac.

Not sure about enabling thread limits, i.e. fix the max number of parallel tasks this class may start so that additional tasks are enqueued for later.

Using sequential policy by default (single-threaded), an arbitrary large number of threads can be enabled after instantiation via YARP property.

PeterBowman commented 4 years ago

Ready at 3c6efbb.

This is a naive solution, it splits requested callbacks into two groups: those executed asynchronously in threads and the remaining sequential ones. Ideally, freed threads should allocate new tasks. This is achieved with a nice thread pool library I borrowed from https://github.com/vit-vit/CTPL: https://github.com/roboticslab-uc3m/yarp-devices/commit/403ade07dce26b4fe34ed77dee626dd575a6061a. Results are even superior in equality of conditions (same number of tasks and threads): FutureTask.zip.

PeterBowman commented 4 years ago

I realized this parallelization is actually meaningless since high frequency commands are further transmitted via TPDO protocol (https://github.com/roboticslab-uc3m/yarp-devices/issues/232). It only made sense for SDO multi-joint requests due to the confirmed nature of these transfers (every SDO client request/indication awaits a response/confirmation from the SDO server), hence sequentially applying this request-and-wait scheme for several joint commands would result in a linear increase of the overall wait.

Cons:

SDOs are low priority commands that should not interfere with high frequency TPDOs.
Actually, SDO requests should rarely happen on normal operation.
Incidental parallelization of TPDOs could quickly become detrimental for the performance of the application due to thread synchronization involved in the registration of new outgoing messages in the TX queue.

Therefore, I'm definitely disabling this feature for YARP motor commands. However, I found this actually helpful for synchronization of multiple CAN buses (https://github.com/roboticslab-uc3m/yarp-devices/issues/226): https://github.com/roboticslab-uc3m/yarp-devices/commit/474556176e8f8c831e8ac5f79389f01b9ea2d45b.

PeterBowman commented 3 years ago

Note YARP's fakeMotionControl device showcases a similar technique with a sequential implementation, split into specific interfaces: yarp::dev::ImplementPositionControl, yarp::dev::ImplementVelocityControl... Therefore, most .cpp files in the CanBusControlboard tree could be removed in favor of this class of ours inheriting from the aforementioned YARP concrete classes, assuming proper joint index configuration on init.

roboticslab-uc3m / yarp-devices

Parallelize CAN writes #230