TCP/IP communication: delay between send and receive

HolQue commented 2 months ago

A test is implemented in the following way:

THREAD    SELFTEST_1     False
    ${bAck}    bitsplatform.selftest_start_event_loop    1    500    201
    Sleep  20s
    ${bAck}    bitsplatform.selftest_stop_event_loop   1
    Send Thread Notification    THREAD_SELFTEST_1_DONE
END

Meaning:

Wait 201 ms, after this let device 1 send bitsplatform notification events every 500 ms. Let device 1 do this for 20 seconds. Finally stop the bitsplatform notification loop and sent a thread notification about the end of the thread.

In parallel a thread SELFTEST_2 does the same with device 2. In parallel a thread SELFTEST_3 does the same with device 3.

At end we wait for all thread notifications:

Wait Thread Notification    THREAD_SELFTEST_1_DONE    timeout=140
Wait Thread Notification    THREAD_SELFTEST_2_DONE    timeout=140
Wait Thread Notification    THREAD_SELFTEST_3_DONE    timeout=140

We investigate now two moments in time:

The time the bitsplatform sends the acknowledge about the stop of the notification event loop.
The time the robot test (= the bitsplatform interface library) receives this acknowledge.

Result:

Sent by bitsplatform: 2024.09.04 13:21:41.939 ; BITP ; Trace ; {INFO} ; {} ; Script Tx: ' STATUS_ OK E_CODE_ 0 TIME(ms)_ 1725448901939 SELFTEST UNIT_3 LOOP STOP_RET' Received by test: 20240904 13:21:43.25440 [15668] [__tcpip_listener] (311) : Message received in listener: 'STATUS_ OK E_CODE_ 0 TIME(ms)_ 1725448901939 SELFTEST UNIT_3 LOOP STOP_RET'

It can be seen that there is a delay of ~2sec between send and receive. That is a lot of time.

Crosscheck: We remove all Wait Thread Notification and insert a hard coded sleep time Sleep 40s to ensure that all threads are finished before the test is finished.

Now this is the result:

Sent by bitsplatform: 2024.09.04 13:27:19.640 ; BITP ; Trace ; {INFO} ; {} ; Script Tx: ' STATUS_ OK E_CODE_ 0 TIME(ms)_ 1725449239640 SELFTEST UNIT_3 LOOP STOP_RET' Received by test: 20240904 13:27:19.21691 [4704] [__tcpip_listener] (293) : Message received in listener: 'STATUS_ OK E_CODE_ 0 TIME(ms)_ 1725449239640 SELFTEST UNIT_3 LOOP STOP_RET'

No delay in seconds any more.

There is also no delay any more in case of messages are sent after all Wait Thread Notification are passed. Or in other words: We only have this delay in case of the test still waits for thread notifications.

We are currently trying to optimize tests for speed. Therefore a 2 seconds delay is very unfavorable.

Is there a possibility to improve this?

test-fullautomation commented 2 months ago

Hi @milanac030988 , looks like

Wait Thread Notification    THREAD_SELFTEST_1_DONE    timeout=140
Wait Thread Notification    THREAD_SELFTEST_2_DONE    timeout=140
Wait Thread Notification    THREAD_SELFTEST_3_DONE    timeout=140

causes an performance issue as long as waiting. did you consider calculation time for other threads during waiting for the notifications?

Please work on this with high priority.

Thank you, Thomas

milanac030988 commented 1 month ago

Hello Holger,

As you might already know, Python doesn’t truly support real parallel threading due to the Global Interpreter Lock (GIL). The GIL ensures that only one thread executes Python bytecode at any given time, which means that even if you have multiple threads, they are essentially running in a serialized manner when executing Python code.

When you use the 'Wait Thread Notification' keyword, what happens is that two threads run alternately. One thread handles the primary task (involving bits of platform work), while the other thread (which is the one running the Wait Thread Notification) checks the queue periodically to see if a notification has been received. Over time, this constant switching and checking naturally leads to the first thread taking longer to complete the task compared to running it alone without the extra thread handling notifications.

On the other hand, when you use Sleep which call sleep() function, it relinquishes the GIL, allowing the first task to continue running without being interrupted by other threads. This leads to almost no noticeable difference in execution time since the sleep-based thread isn't consuming CPU resources or frequently checking for notifications.

When you start thread 1, thread 2, and thread 3, and then run Wait Thread Notification (1), Wait Thread Notification (2), and Wait Thread Notification (3), it means that four threads are now running at the same time: thread 1, 2, and 3, plus the thread that is checking notifications (Wait Thread Notification). These threads will run alternately, so none of them are running truly in parallel but instead in an interleaved fashion due to the GIL.

I’ll try to see if there’s another way to improve this behavior, but I’m not sure how much we can improve given the limitations.

P.S.: I’m a bit confused here—maybe there was a typo—but how does the bitsplatform interface library receive the message (13:27:19.21691) before the bitsplatform actually sends it (2024.09.04 13:27:19.640) 😆?

Thank you,

HolQue commented 1 month ago

Hello Cuong,

all lines are copied out of the log files, therefore I would not expect any typos. But the bitsplatform is a C++ written application, whereas the interface library is implemented in Python.

The milliseconds part in the timestamp I compute in this way:

current_time_high_res = time.perf_counter()
current_time_milliseconds_high_res = int(current_time_high_res * 1000)

I don't know how this is done in the bitsplatform in C++.

Therefore I think these values are not suitable for comparison between different applications.

test-fullautomation commented 1 month ago

Hi @milanac030988 , thank you for your detailed answer. Despite this I have still concerns. The TML implementation for thread notifications and synchronization has as far as I know no performance issue. At least I got never an issue reported - but I have also no selftest for performance. So I still assume that somehow it seems to be possible to improve the performance?

Thank you, Thomas

milanac030988 commented 1 month ago

Hello @HolQue,

How can I reproduce this on my PC? I'm trying to reproduce but I can't see any log with timestamp. Do I need to enable any config in bits_platform.py? I have BITS Platform installed but have no device connect. Do I need a connected device?

Thank you, Cuong

HolQue commented 1 month ago

Hi Cuong,

the fault should be reproducible with the help of the code listed at the beginning of this issue.

The test signal used in this code, belongs to the self test. It is not necessary to have a real device connected.

But the bitsplatform should run in demo mode.

And I am sorry - but I forgot to mention: To get the log lines listed above, a thread logging has to be activated:

Example: Library ../libs/bits_platform.py log_level=INFO thread_logging=${True} log_to_console=${True} WITH NAME bitsplatform

This will create a subfolder 'threadlogs' within the log files folder, containing additional log files.

The log files of the bitsplatform can be found in

C:\BITS_Platform\Log

test-fullautomation / robotframework

TCP/IP communication: delay between send and receive #94