vectorgrp / sil-kit

Vector SIL Kit – Open-Source Library for Connecting Software-in-the-Loop Environments
https://vectorgrp.github.io/sil-kit-docs
MIT License
107 stars 32 forks source link

Ethernet packets delayed or even lost #6

Closed ttmapr closed 1 year ago

ttmapr commented 1 year ago

Describe the issue If the ethernet frame handler callback has not yet finished processing a frame event when it should be called for the next received frame, SIL Kit does not retry to call the frame handler - not until another frame will be sent by a participant. If the participant sends another frame, the frame handler will be called for each of the missing frames (which are therefore delayed). But if the participant does not send another frame, the handler won't be called again, which results in packet loss.

To Reproduce (Precondition: sil-kit-registry.exe must be running on the same host, using the default URI)

  1. compile the attached example files
  2. run receiver.exe (will wait until the 6 packets from sender have been received)
  3. run sender.exe
  4. On my system, the problem occured in most cases. It results in the receiver only dumping the first few packets, and then hanging because the frame handler won't be called again for the frames which are still on the way.
  5. so cancel receiver.exe by pressing Ctrl+C
  6. uncomment the Sleep(500) in line 37 of sender.c
  7. compile again
  8. run receiver.exe
  9. run sender.exe
  10. Now all 6 frames should be received and dumped, and the receiver terminates with the message "all packets received :-)"

Expected behavior As it is not guaranteed that another participant will send further frames in order to have the frame handler called also for the missing frames, I would expect that the dispatcher thread of SIL Kit retries to deliver the frames. The dispatcher thread should be able to know when the frame handler can be called again, because the frame handler will return at some time from the previous call.

Screenshots not applicable

Your Environment (please complete the following information):

Additional context Example source files attached. receiver.c sender.c

VDanielEdwards commented 1 year ago

Hello @ttmapr, first of all, thank you for your message! We have a public holiday coming up, and will come back to you in the beginning of next week.

VDanielEdwards commented 1 year ago

Hello @ttmapr, I am very sorry for the delay!

The issue you experience is that you are tearing down the participant before it had time to send the frames to other participants.

This can be solved, by using the current version of SIL Kit and utilizing the lifecycle service to ensure that the process doesn't end too early.

Since code says more than a thousand words, I have adapted your examples (and added a CMake script for my own convenience), the full code is attached to this post.

Please note, that instead of volatile variables, the synchronization should really use the platform specific mutex / condition variable implementation. Since MSVC does not support the C11 standard threads.h header, this has been implemented here with a busy loop - which I wouldn't recommend in practice.

Sender

static SilKit_EthernetController *controller_ptr = NULL;
static volatile int lifecycle_started = 0;

void lifecycle_comready_handler(void *context, SilKit_LifecycleService *lifecycle_service_ptr) {
  printf("communication ready\n");
  check_return_code(SilKit_EthernetController_Activate(controller_ptr));
}

void lifecycle_starting_handler(void *context, SilKit_LifecycleService *lifecycle_service_ptr) {
  printf("starting\n");
  lifecycle_started = 1;
}

static void send_frame(size_t size, const uint8_t *data) {
    SilKit_EthernetFrame frame;
    frame.structHeader.version = SK_ID_MAKE(Ethernet, SilKit_EthernetFrame);
    frame.raw.data = data;
    frame.raw.size = size;
    if (0 != SilKit_EthernetController_SendFrame(controller_ptr, &frame, NULL)) {
        printf("failed to send frame\n");
    }
}

static void frame_tx_handler(void* context, SilKit_EthernetController* controller, SilKit_EthernetFrameTransmitEvent* frameTransmitEvent) {
    if (frameTransmitEvent->status != SilKit_EthernetTransmitStatus_Transmitted) {
        printf("frame could not be sent: status=%d\n", frameTransmitEvent->status);
    }
}

int main(int argc, char **argv) {
    // load the participant configuration
    SilKit_ParticipantConfiguration *participant_config_ptr;
    check_return_code(SilKit_ParticipantConfiguration_FromString(&participant_config_ptr, "Description: receiver"));

    // create the participant
    SilKit_Participant *participant_ptr;
    check_return_code(SilKit_Participant_Create(&participant_ptr, participant_config_ptr, "ethernet_sender", "silkit://localhost:8500"));

    // create and setup the lifecycle and it's callbacks
    SilKit_LifecycleService *lifecycle_service_ptr;
    SilKit_LifecycleConfiguration lifecycle_configuration;
    SilKit_Struct_Init(SilKit_LifecycleConfiguration, lifecycle_configuration);
    lifecycle_configuration.operationMode = SilKit_OperationMode_Autonomous;
    check_return_code(SilKit_LifecycleService_Create(&lifecycle_service_ptr, participant_ptr, &lifecycle_configuration));
    check_return_code(SilKit_LifecycleService_SetCommunicationReadyHandler(lifecycle_service_ptr, NULL, lifecycle_comready_handler));
    check_return_code(SilKit_LifecycleService_SetStartingHandler(lifecycle_service_ptr, NULL, lifecycle_starting_handler));

    // create and setup the ethernet controller and it's callbacks
    check_return_code(SilKit_EthernetController_Create(&controller_ptr, participant_ptr, "send ctrl", "network"));
    SilKit_HandlerId frame_tx_handler_id;
    check_return_code(SilKit_EthernetController_AddFrameTransmitHandler(controller_ptr, NULL, &frame_tx_handler, 0xffffffff, &frame_tx_handler_id));

    // signal the lifecycle service to start, we still have to wait for the starting callback
    check_return_code(SilKit_LifecycleService_StartLifecycle(lifecycle_service_ptr));

    // wait for the starting callback
    while (!lifecycle_started) {}

    // now send all frames
    send_frame(14, (uint8_t *)"\x00\x16\x81\x00\x00\x01\x00\x16\x81\x00\x00\x02\x12\x34");
    send_frame(14, (uint8_t *)"\x00\x16\x81\x00\x00\x01\x00\x16\x81\x00\x00\x02\x23\x45");
    send_frame(14, (uint8_t *)"\x00\x16\x81\x00\x00\x01\x00\x16\x81\x00\x00\x02\x34\x56");
    send_frame(14, (uint8_t *)"\x00\x16\x81\x00\x00\x01\x00\x16\x81\x00\x00\x02\x45\x67");
    send_frame(14, (uint8_t *)"\x00\x16\x81\x00\x00\x01\x00\x16\x81\x00\x00\x02\x56\x78");
    send_frame(14, (uint8_t *)"\x00\x16\x81\x00\x00\x01\x00\x16\x81\x00\x00\x02\x67\x89");
    printf("all packets sent\n");

    // signal the lifecycle service to stop
    check_return_code(SilKit_LifecycleService_Stop(lifecycle_service_ptr, "done"));

    // wait for the lifecycle to complete
    SilKit_ParticipantState participant_state;
    check_return_code(SilKit_LifecycleService_WaitForLifecycleToComplete(lifecycle_service_ptr, &participant_state));
    printf("final participant state: %" PRIu16 "\n", participant_state);

    check_return_code(SilKit_Participant_Destroy(participant_ptr));

    return 0;
}

Receiver

static SilKit_EthernetController *controller_ptr = NULL;
static volatile uint64_t num_packets_received = 0;
static volatile int lifecycle_started = 0;

void lifecycle_comready_handler(void *context, SilKit_LifecycleService *lifecycle_service_ptr) {
    printf("communication ready\n");
    check_return_code(SilKit_EthernetController_Activate(controller_ptr));
}

void lifecycle_starting_handler(void *context, SilKit_LifecycleService *lifecycle_service_ptr) {
  printf("starting\n");
  lifecycle_started = 1;
}

void frame_handler(void *context, SilKit_EthernetController *controller, SilKit_EthernetFrameEvent* frameEvent) {
    printf("packet received having %zd bytes:", frameEvent->ethernetFrame->raw.size);
    for (size_t i = 0; i < frameEvent->ethernetFrame->raw.size; ++i) {
        printf(" %02x", frameEvent->ethernetFrame->raw.data[i]);
    }
    printf("\n");
    ++num_packets_received;
}

int main(int argc, char **argv) {
    // load the participant configuration
    SilKit_ParticipantConfiguration *participant_config_ptr;
    check_return_code(SilKit_ParticipantConfiguration_FromString(&participant_config_ptr, "Description: receiver"));

    // create the participant
    SilKit_Participant *participant_ptr;
    check_return_code(SilKit_Participant_Create(&participant_ptr, participant_config_ptr, "ethernet_receiver", "silkit://localhost:8500"));

    // create and setup the lifecycle and it's callbacks
    SilKit_LifecycleService *lifecycle_service_ptr;
    SilKit_LifecycleConfiguration lifecycle_configuration;
    SilKit_Struct_Init(SilKit_LifecycleConfiguration, lifecycle_configuration);
    lifecycle_configuration.operationMode = SilKit_OperationMode_Autonomous;
    check_return_code(SilKit_LifecycleService_Create(&lifecycle_service_ptr, participant_ptr, &lifecycle_configuration));
    check_return_code(SilKit_LifecycleService_SetCommunicationReadyHandler(lifecycle_service_ptr, NULL, lifecycle_comready_handler));
    check_return_code(SilKit_LifecycleService_SetStartingHandler(lifecycle_service_ptr, NULL, lifecycle_starting_handler));

    // create and setup the ethernet controller and it's callbacks
    check_return_code(SilKit_EthernetController_Create(&controller_ptr, participant_ptr, "recv ctrl", "network"));
    SilKit_HandlerId frame_handler_id;
    check_return_code(SilKit_EthernetController_AddFrameHandler(controller_ptr, NULL, &frame_handler, SilKit_Direction_Receive, &frame_handler_id));

    // signal the lifecycle service to start, we still have to wait for the starting callback
    check_return_code(SilKit_LifecycleService_StartLifecycle(lifecycle_service_ptr));

    // wait for the starting callback
    while (!lifecycle_started) {}

    // wait until all frames have been received
    while (num_packets_received < 6) {}
    printf("all packets received :-)\n");

    // signal the lifecycle service to stop
    check_return_code(SilKit_LifecycleService_Stop(lifecycle_service_ptr, "done"));

    // wait for the lifecycle to complete
    SilKit_ParticipantState participant_state;
    check_return_code(SilKit_LifecycleService_WaitForLifecycleToComplete(lifecycle_service_ptr, &participant_state));
    printf("final participant state: %" PRIu16 "\n", participant_state);

    check_return_code(SilKit_Participant_Destroy(participant_ptr));

    return 0;
}

sil-kit_github_issue-8.zip

VDanielEdwards commented 1 year ago

I am closing this issue @ttmapr. If you have further questions or suggestions, please feel free to re-open this issue! Thank you!