rtlabs-com / c-open

CANopen stack for embedded devices
http://www.rt-labs.com
Other
79 stars 40 forks source link

High cpu-usage on linux #3

Closed hefloryd closed 4 years ago

hefloryd commented 4 years ago

On linux we check for incoming can frames by creating a thread that blocks in select(). When select() unblocks, the thread calls a callback to inform the main thread that there are messages to read, by posting in a mailbox. The thread will then attempt the select() again, which will succeed until the main thread has read the can message. If the main thread is not allowed to run this will cause high cpu-usage and fill the mailbox.

Find a better method to create a callback on incoming messages on linux, or rework the abstraction layer so that incoming messages can be posted to the main thread.

nattgris commented 4 years ago

I noticed high CPU load as well (23%), but found it to be the CO_JOB_PERIODIC timer which fires every ms with the SIGEV_THREAD notification, i.e. spawning a new thread every ms. I get high CPU usage without ever receiving a CAN message so I doubt the repeated select() is the (only) culprit.

nattgris commented 4 years ago

Verified that idle CPU consumption scales with the timer period, i.e. 2 ms gives me half the load.

It would be nice if the clients of the periodic job feature could say when they are expiring, so that the timer can be armed to the time required instead of periodically. Since all callees in co_handle_periodic() seems to end up using co_is_expired() to check now against a timestamp and fixed period, this should be easily implemented by returning the time left from co_is_expired() and propagating the minimum value up the call chain to co_main() which can re-arm the timer.

Then also the CO_JOB_PERIODIC message must be force sent whenever any event causes a timer to be set, to recalculate the timeout. Maybe co_handle_periodic() can be called every iteration of the co_main() loop to cover all cases where the event passes trough the main thread? Are there others?

The main loop could probably even use the timeout feature of os_mbox_fetch() to wake up at the right time instead of keeping a separate timer around.

hefloryd commented 4 years ago

The select() issue should be fixed now, by using epoll(). The timers were reworked to use SIGEV_THREAD_ID, this avoids creating a new thread whenever the timer fires and improves the cpu-usage somewhat. It still seems fairly expensive to do periodic work at 1000Hz though.

I like the idea of calculating the next timeout dynamically, assuming jitter can be avoided (for other platforms where the timers are more reliable). Could you create a new issue for that?

hefloryd commented 4 years ago

Closed, but see #6