RealCAN implementation - Githubissues

GGZ8 commented 2 months ago

Dear maintainers,

we are a research team at the University of Modena and Reggio Emilia, and we would like to contribute upstream our recent development on top of the canplayer tool.

This pull request introduces a new extension that we dubbed RealCAN, which enables replaying an existing CAN logfile respecting the original relative timestamps. Our use case is to replay the given CAN trace in a more realistic way, so as to exercise our test network with the correct inter-arrival times that the ECUs expect.

Changes:

Modified the frame sending mechanism to incorporate a timing check that aligns frame transmission with their scheduled inter-arrival times;
Added a new command-line option, -r, to enable real-time dispatching of CAN frames;
Updated the usage message to include instructions for the new soft real-time features.

We hope you'll find this contribution useful, and we are open to any suggestions on how to further refined this feature.

Wish you a great day!

delta512 commented 2 months ago

Hello, there you have two iterations of changes as per your requests. My commit is just every trivial correction to the code style, while @GGZ8 reshaped the real code stuff.

Let us know if there's anything more to tweak.

Cheers

marckleinebudde commented 2 months ago

Please squash everything into one commit.

GGZ8 commented 2 months ago

Please squash everything into one commit.

Done! 👍

marckleinebudde commented 2 months ago

please rebase to latest master, which includes https://github.com/linux-can/can-utils/pull/522 and causes a conflict now.

marckleinebudde commented 2 months ago

If I create a logfile with candump any,0:0,#FFFFFFFF -l and replay it with canplayer -I, canplayer will respect the timestamps in the created logfile.

What problem fixes the new '-r' option?
What benefit does it bring?
How is it supposed to be used?

marckleinebudde commented 2 months ago

Ok, I understand which problem you want to solve. The current code sleeps for -g ms and checks for any CAN frames to send. That design is, let's say, sub-optimal.

Properly fixing this requires several steps on the existing code:

convert everything to struct timespec
read the first timestamp from the file
calculate the offset between the first timestamp and now clock_gettime(CLOCK_MONOTONIC)
in the inner loop: add the offset to the next timestamp and check against clock_gettime(CLOCK_MONOTONIC)
use clock_nanosleep() with TIMER_ABSTIME to wait for the next event

See f55ea38d146ac17155e9d87c873df2e24e79f39d, where I added absolute timing to cangen.c

hartkopp commented 2 months ago

Dear maintainers,

we are a research team at the University of Modena and Reggio Emilia, and we would like to contribute upstream our recent development on top of the canplayer tool.

This pull request introduces a new extension that we dubbed RealCAN, which enables replaying an existing CAN logfile respecting the original relative timestamps. Our use case is to replay the given CAN trace in a more realistic way, so as to exercise our test network with the correct inter-arrival times that the ECUs expect.

I have a general question about this patch. IIRC the gaps between timestamps are already taken into account. Either when getting the logfile format from a file or from stdin:

(1715790289.248142) vcan0 5A8#EE41B4359BC73D
(1715790289.448425) vcan0 636#C47F
(1715790289.648726) vcan0 6FE#0E91243BAB988F49
(1715790289.849038) vcan0 118#7EB83B5D
(1715790290.049331) vcan0 32C#CB

So what functionality does your patch add?

marckleinebudde commented 2 months ago

canplayer doesn't use proper sleep between 2 CAN messages. By default it sleep 1ms (configurable by -g parameter) and sends all CAN messages in a row that have an expired timestamp. The proper approach would be to convert canplayer to absolute time handling and sleep until the next CAN frame should be send.

GGZ8 commented 2 months ago

Sounds good, the changes you suggested enable a more precise replay of the original trace. The use of the nanosleep family of functions ensure a more efficient CPU sharing.

However, during our developments we could not saturate the bus, be it a virtual vcan network or a physical one with real CAN adapters. We observed that this limitation is actually due to the nanosleep function, which - as per its man page - suspends the execution of the calling thread until at least the time specified by request has elapsed.

We overcame this limitation using a spin-lock while waiting for a given deadline. Comparing our implementation against the upstream canplayer version, we found that the nanosleep approach introduces non-negligible latency between messages. The test has been performed using a set of values for the -g option and a CAN trace with fixed inter-arrival time between CAN messages. This should behave the same as if the changes you suggested were already in place.

Thus, the -r option enables replaying with soft real-time capabilities when required by the scenario, leaving the nanosleep mechanism as the default behavior.

Since the paper is still under review, we can only share the results privately at the moment.

Let us know what you think about this evaluation, while we apply the minor correction stil pending.

marckleinebudde commented 2 months ago

spin_lock is a concept from the Linux kernel for short term locking. What do you mean by spin_lock?

If the overhead by nanosleep() is too big, you might want to use SO_TXTIME, see https://github.com/linux-can/can-utils/commit/c6f2cf7c2f2786b0a769534d0a1f0716326a3494 for an example.

Can you illustrate in a few lines of shell code, how the canplayer with -r should be used?

Works:

./canplayer -I ./candump-absolute.log

Adding -r doesn't work:

./canplayer -I ./candump-absolute.log -r 
Specify an input file for option -r !

Maybe -r earlier?

./canplayer -r -I ./candump-absolute.log

it hangs in:

#0  0x00007f8b3be8741d in __GI___libc_read (fd=0, buf=0x559ca2367340, nbytes=1024) at ../sysdeps/unix/sysv/linux/read.c:26
        sc_ret = -512
        __arg3 = <optimized out>
        _a2 = <optimized out>
        sc_ret = <optimized out>
        sc_ret = <optimized out>
        __arg1 = <optimized out>
        _a3 = <optimized out>
        sc_cancel_oldtype = <optimized out>
        resultvar = <optimized out>
        __arg2 = <optimized out>
        _a1 = <optimized out>
#1  0x00007f8b3be0ad93 in _IO_new_file_underflow (fp=0x7f8b3bf5f8e0 <_IO_2_1_stdin_>) at ./libio/libioP.h:1030
        count = <optimized out>
#2  0x00007f8b3be00090 in __GI___getdelim (lineptr=lineptr@entry=0x7ffdc14321c0, n=n@entry=0x7ffdc14321c8, delimiter=delimiter@entry=10, fp=fp@entry=0x7f8b3bf5f8e0 <_IO_2_1_stdin_>) at ./libio/iogetdelim.c:77
        _IO_acquire_lock_file = 0x7f8b3bf5f8e0 <_IO_2_1_stdin_>
        result = <optimized out>
        cur_len = 0
        len = 0
#3  0x0000559ca21e73b9 in getline (__lineptr=0x7ffdc14321c0, __n=0x7ffdc14321c8, __stream=0x7f8b3bf5f8e0 <_IO_2_1_stdin_>) at /usr/include/x86_64-linux-gnu/bits/stdio.h:120
No locals.
#4  load_gaps_from_file (fp=fp@entry=0x7f8b3bf5f8e0 <_IO_2_1_stdin_>, timestamps=timestamps@entry=0x7ffdc1432278) at ../canplayer.c:270
        line = 0x559ca23672c0 ""
        len = 120
        read = <optimized out>
#5  0x0000559ca21e766b in main (argc=4, argv=0x7ffdc1432408) at ../canplayer.c:388
        buf = '\000' <repeats 6344 times>
        device = '\000' <repeats 21 times>
        afrbuf = '\000' <repeats 6299 times>
        cu = {cc = {can_id = 0, {len = 0 '\000', can_dlc = 0 '\000'}, __pad = 0 '\000', __res0 = 0 '\000', len8_dlc = 0 '\000', data = "\000\000\000\000\000\000\000"}, fd = {can_id = 0, len = 0 '\000', flags = 0 '\000', __res0 = 0 '\000', 
            __res1 = 0 '\000', data = '\000' <repeats 63 times>}, xl = {prio = 0, flags = 0 '\000', sdt = 0 '\000', len = 0, af = 0, data = '\000' <repeats 2047 times>}}
        today_tv = {tv_sec = 0, tv_usec = 0}
        log_tv = {tv_sec = 0, tv_usec = 0}
        last_log_tv = {tv_sec = 0, tv_usec = 0}
        diff_tv = {tv_sec = 0, tv_usec = 0}
        verbose = 0
        opt = <optimized out>
        delay_loops = 0
        skipgap = 0
        loopback_disable = 0
        infinite_loops = 0
        loops = 1
        addr = {can_family = 0, can_ifindex = 0, can_addr = {tp = {rx_id = 0, tx_id = 0}, j1939 = {name = 0, pgn = 0, addr = 0 '\000'}}}
        vcid_opts = {flags = 2 '\002', tx_vcid = 0 '\000', rx_vcid = 0 '\000', rx_vcid_mask = 0 '\000'}
        sleep_ts = {tv_sec = 0, tv_nsec = 0}
        timestamps = {sleep_vector = 0x559ca23672a0, idx = 0, size = 1}
        act_time = {tv_sec = 140727845855936, tv_usec = 4194304}
        init_time = {tv_sec = 0, tv_usec = 0}
        infile = 0x7f8b3bf5f8e0 <_IO_2_1_stdin_>
        gap = 1
        use_timestamps = 1
        interactive = 0
        count = 0
        gap_from_file = true
        assignments = <optimized out>
        s = <optimized out>
        i = <optimized out>
        j = <optimized out>
        fret = <optimized out>
        eof = <optimized out>
        sec = <optimized out>
        usec = <optimized out>
        txidx = <optimized out>
        txmtu = <optimized out>
        send_time = <optimized out>

marckleinebudde commented 2 months ago

e49499d06d59a620d8f275982294350494aafffd throws the following warnings on amd64:

➜ (pts/57) frogger@hardanger:can-utils-tmp/x86 (pr-realcan) make
  CC       canplayer.o
../canplayer.c:283:29: warning: format specifies type 'unsigned long *' but the argument has type '__time_t' (aka 'long') [-Wformat]
                sscanf(line, "(%lu.%lu)", current->tv_sec, current->tv_usec);
                               ~~~        ^~~~~~~~~~~~~~~
../canplayer.c:283:46: warning: format specifies type 'unsigned long *' but the argument has type '__suseconds_t' (aka 'long') [-Wformat]
                sscanf(line, "(%lu.%lu)", current->tv_sec, current->tv_usec);
                                   ~~~                     ^~~~~~~~~~~~~~~~

on 32 bit arm:

../canplayer.c: In function ‘load_gaps_from_file’:
../canplayer.c:283:34: warning: format ‘%lu’ expects argument of type ‘long unsigned int *’, but argument 3 has type ‘__time_t’ {aka ‘long int’} [-Wformat=]
  283 |                 sscanf(line, "(%lu.%lu)", current->tv_sec, current->tv_usec);
      |                                ~~^        ~~~~~~~~~~~~~~~
      |                                  |               |
      |                                  |               __time_t {aka long int}
      |                                  long unsigned int *
../canplayer.c:283:38: warning: format ‘%lu’ expects argument of type ‘long unsigned int *’, but argument 4 has type ‘__suseconds_t’ {aka ‘long int’} [-Wformat=]
  283 |                 sscanf(line, "(%lu.%lu)", current->tv_sec, current->tv_usec);
      |                                    ~~^                     ~~~~~~~~~~~~~~~~
      |                                      |                            |
      |                                      long unsigned int *          __suseconds_t {aka long int}
../canplayer.c: In function ‘main’:
../canplayer.c:574:59: warning: ‘timestamps.idx’ may be used uninitialized [-Wmaybe-uninitialized]
  574 |                                         if (gap_from_file && timestamps.idx > 0) {
      |                                             ~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~~~~~
../canplayer.c:327:22: note: ‘timestamps.idx’ was declared here
  327 |         struct sleep timestamps;
      |                      ^~~~~~~~~~

hartkopp commented 2 months ago

I still have a problem with the new restriction, that requires a logfile and does not work with stdin anymore.

E.g. a common use-case to use a "remote" CAN interface on your local host is: ssh user@host candump -L can0 | canplayer vcan0=can0

When looking at this code, which seems to be the relevant difference to the former nanosleep approach:

while (timercmp(&act_time, &send_time, <)){
    gettimeofday(&act_time, NULL);
    timersub(&act_time, &init_time, &act_time);
}

Why shouldn't this work with stdin too?

marckleinebudde commented 1 month ago

If you read the timestamps into the array before stating the sending loop you need to rewind the input stream, this is not possible with stdin. Yes, it makes no sense to read the timestamps in the first place.

And the while loop Oliver mentioned in https://github.com/linux-can/can-utils/pull/521#issuecomment-2119291085 is better coded with absolute timeouts, as I mentioned in https://github.com/linux-can/can-utils/pull/521#issuecomment-2112738427.

GGZ8 commented 1 month ago

I agree with marckleinebudde that is not possible to implement real time sending from stdin. I modified the implementation to accept as input traces in which messages have absolute timestamps, rather than relative timestamps as before. Rather than doing so in the main sending loop, I added some code to the initial pre-processing phase.

This solution does not introduce any delay in the sending phase.

marckleinebudde commented 1 month ago

The question remains, why do you need to read all the time stamps before into the array?

And it's still not clear how the new option should be used:

$ ./canplayer -r -I ./candump-absolute.log
Specify an input file for option -r !
$ ./canplayer -r ./candump-absolute.log -I ./candump-absolute.log 
Specify an input file for option -r !

It only starts this way:

$ ./canplayer -I ./candump-absolute.log -r

I'm trying to replay candump-absolute.log. A CAN frame should be send every 100ms, but it doesn't work as expected:

$ candump any,0:0,#FFFFFFFF -cexdtd
 (000.000000)  vcan0  TX - -  002   [1]  00
 (000.100105)  vcan0  TX - -  002   [1]  01
 (000.099997)  vcan0  TX - -  002   [2]  02 00
 (000.099976)  vcan0  TX - -  002   [3]  03 00 00
 (000.100123)  vcan0  TX - -  002   [4]  04 00 00 00
 (000.599798)  vcan0  TX - -  002   [5]  05 00 00 00 00
 (000.000026)  vcan0  TX - -  002   [6]  06 00 00 00 00 00
 (000.000004)  vcan0  TX - -  002   [7]  07 00 00 00 00 00 00
 (000.000003)  vcan0  TX - -  002   [8]  08 00 00 00 00 00 00 00
 (000.000004)  vcan0  TX - -  002   [1]  09
 (000.000115)  vcan0  TX - -  002   [1]  0A
 (000.099956)  vcan0  TX - -  002   [2]  0B 00
 (000.100037)  vcan0  TX - -  002   [3]  0C 00 00
 (000.099992)  vcan0  TX - -  002   [4]  0D 00 00 00
 (000.099942)  vcan0  TX - -  002   [5]  0E 00 00 00 00
 (000.599921)  vcan0  TX - -  002   [6]  0F 00 00 00 00 00
 (000.000026)  vcan0  TX - -  002   [7]  10 00 00 00 00 00 00
 (000.000003)  vcan0  TX - -  002   [8]  11 00 00 00 00 00 00 00
 (000.000003)  vcan0  TX - -  002   [1]  12
 (000.000004)  vcan0  TX - -  002   [1]  13
 (000.000031)  vcan0  TX - -  002   [2]  14 00
 (000.100153)  vcan0  TX - -  002   [3]  15 00 00
 (000.099989)  vcan0  TX - -  002   [4]  16 00 00 00
 (000.100035)  vcan0  TX - -  002   [5]  17 00 00 00 00
 (000.099923)  vcan0  TX - -  002   [6]  18 00 00 00 00 00
 (000.599833)  vcan0  TX - -  002   [7]  19 00 00 00 00 00 00
 (000.000032)  vcan0  TX - -  002   [8]  1A 00 00 00 00 00 00 00
 (000.000007)  vcan0  TX - -  002   [1]  1B
 (000.000005)  vcan0  TX - -  002   [1]  1C
 (000.000004)  vcan0  TX - -  002   [2]  1D 00
 (000.000168)  vcan0  TX - -  002   [3]  1E 00 00
 (000.099850)  vcan0  TX - -  002   [4]  1F 00 00 00

hartkopp commented 1 month ago

Additionally the canplayer has to be able to cope with timestamps that go back and forth as the timestamps created with candump any might have some jitter (at least) over different CAN interfaces. So there is no monotonic increasing timestamp.

The question to me is whether we can increase the accuracy on demand with some improvement that goes away from the "gap" approach. But the features to read from stdin and the possibility to handle potential timestamp jitters have to remain,

marckleinebudde commented 1 month ago

See: https://github.com/linux-can/can-utils/pull/521#issuecomment-2112738427

GGZ8 commented 1 month ago

Since the -r option check the presence of an input file, the only way to use is -I input_file -r.

I implemented the solution you suggested in https://github.com/linux-can/can-utils/pull/521#issuecomment-2126328951 removing the pre-processing phase while keeping the main loop implementation using the log_tv variable as absolute reference timestamp. I tested the implementation using a CAN trace extracted from a real vehicle including over 1 billion of CAN frames and the results are comparable between the two implementations.

Thank you for the advise.

hartkopp commented 1 month ago

Of course you name the variable to switch to the real time mode gap_from_file - but is this still the right naming? When you now read the CAN log input in one single pass, are you sure you need to check whether you get this log file content from a file or from stdin ?

GGZ8 commented 1 month ago

I think that the variable name should remain gap_from_file since we are still reading the gaps from the input trace, just avoiding the preprocessing phase.

keep in mind that reading the messages directly from the stdin cannot guarantee any soft real-time requirements, due to all the overhead introduced by the OS from the sending program up to the exact moment we receive the message.

hartkopp commented 1 month ago

I'm still not convinced.

You also have a busy loop while() statement in line 588 here: https://github.com/linux-can/can-utils/commit/277048eba62eb318eb83c292b543aaf4b92f2d3e#diff-5470ddc2c04ac012932be5e206da0c6c6599f9822e13947ae8844e96387e7b65R588

So where is the improvement to the former loop? Is it the lack of precision of timestamps at replay time that you are missing?

If so - are you aware of the -g <ms> option? (default = 1 ms)

E.g. you can define the gap length for the main loop even smaller than 1ms:

canplayer -I candump-absolute.log -g 0.01

which reduces the while loop to 10 micro seconds in this case and therefore increases the precision for the replay timestamps. Btw. the shortest CAN frame (CC with 1Mbit/s and zero data length) is 47 micro seconds.

Does this probably fit your requirements?

GGZ8 commented 1 month ago

When using the -g <ms> option, the interval between CAN frames is kept fixed according to the value of g, so messages are sent every <ms>. In contrast, the busy while loop was implemented with the aim of respecting the inter-arrival time of the original CAN trace. A real CAN trace recorded from a vehicle has a variable inter-arrival time between messages that cannot be respected using the -g option. Even by reducing the gap length of -g to 0.01 milliseconds the problem remain, moreover the actual implementation use the nanosleep approach with its limitations, see https://github.com/linux-can/can-utils/pull/521#issuecomment-2117140742

marckleinebudde commented 1 month ago

Replacing the fixed length nanosleep() by a busy-loop nanosleep() is the wrong approach. Use clock_nanosleep() with absolute timeouts, see https://github.com/linux-can/can-utils/pull/521#issuecomment-2112738427. And/or make use of SO_TXTIME, see https://github.com/linux-can/can-utils/pull/521#issuecomment-2117173507.

GGZ8 commented 1 week ago

Dear maintainers,

I implemented the solution suggested in https://github.com/linux-can/can-utils/pull/521#issuecomment-2117173507 using the SO_TXTIME like the cangen tool does, however the results obtained introduce a non-negligible delay in the replay of the CAN trace.

To test the implementation fairly, I artificially generated a CAN trace where each message was sent every 1 ms, then I compared both the cangen mode and the RealCAN implementations using two CAN adapters (USB-TO-CAN) and measuring the arrival times using the candump tool.

The results show that our RealCAN implementation achieves better performance, keeping the overall delay below 1 ms.

Moreover, I also generated a CAN trace with varying inter-arrival times between messages, so as to specifically measure the performance required to dynamically change the SO_TXTIME socket option. This time, the results show a further degradation of performance for the cangen mode implementation, due to the overhead in modifying the socket option for every message.

The delay between both the implementations (cangen mode and RealCAN) against the original traces can be found in the attached files:

fixed inter-arrival time 1ms fixed_1ms.pdf
adaptive inter-arrival time 1-2ms adaptive_1_2_ms.pdf

linux-can / can-utils

RealCAN implementation #521