org-arl / unet-contrib

Unet user contributions
BSD 3-Clause "New" or "Revised" License
11 stars 15 forks source link

`unet.c` unetsocket_receive() does not necessarily return NULL if a datagram is unavailable. #87

Closed brodiealexander closed 1 year ago

brodiealexander commented 1 year ago

So, I have been working on trying to provide an example based on rxdata.c which uses unetstocket_receive with a timeout on a loop. Referencing the documentation, it is clear that the intention is that we should free fjage messages after consuming them, and it is also clear from unet.h that unetsocket_receive() is intended to return NULL if no datagram is available.

However, running the minimal working example below, it is clear that unetsocket_receive() does not necessarily return NULL on a timeout condition, neither is it set to NULL in the case of fjage_msg_destroy().

The result is that, in the first loop which happens after a message has already been received, the message variable 'ntf' is not set to NULL, and instead, the fjage method to retrieve the class returns an empty character array, and the program crashes due to a double free error.

image

In the screenshot, you can also see an issue related to #85 where the program receives an RxFrameNtf. I have disabled the printing of data from RxFrameNtf in the MWE, because under some conditions the misinterpretation of the data can cause a segfault.

I have some ideas on how to fix this inside of *monitor(), but wanted to ask for your input before attempting to make a PR or putting much serious work into it.

I have provided a minimal working example based on rxdata.c below:

///////////////////////////////////////////////////////////////////////////////
//
// Receive data.
//
// In terminal window (an example):
//
// $ make samples
// $ ./rxdata <ip_address> <port>
//
////////////////////////////////////////////////////////////////////////////////

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "../unet.h"

#ifndef _WIN32
#include <unistd.h>
#include <netdb.h>
#include <sys/time.h>
#endif

#define NBYTES 9

static int error(const char *msg)
{
    printf("\n*** ERROR: %s\n\n", msg);
    return -1;
}

int main(int argc, char *argv[])
{
    unetsocket_t sock;

    uint8_t data[NBYTES];
    int port = 1100;
    int rv;
    if (argc <= 2)
    {
        error("Usage : rxdata <ip_address> <port>\n"
              "ip_address: IP address of the receiver modem. \n"
              "port: port number of the receiver modem \n"
              "A usage example: \n"
              "rxdata 192.168.1.10 1100\n");
        return -1;
    }
    else
    {
        port = (int)strtol(argv[2], NULL, 10);
    }

#ifndef _WIN32
    // Check valid ip address
    struct hostent *server = gethostbyname(argv[1]);
    if (server == NULL)
    {
        error("Enter a valid ip addreess\n");
        return -1;
    }
#endif

    // Open a unet socket connection to modem
    printf("Connecting to %s:%d\n", argv[1], port);
    sock = unetsocket_open(argv[1], port);
    if (sock == NULL)
        return error("Couldn't open unet socket");

    // Bind to protocol DATA
    rv = unetsocket_bind(sock, DATA);

    if (rv != 0)
        return error("Error binding socket");

    // Set a timeout of 10 seconds
    unetsocket_set_timeout(sock, 2000);

    // Receive and display data
    printf("Waiting for a Datagram\n");

    while (true)
    {
        fjage_msg_t ntf;

        ntf = unetsocket_receive(sock);
        if (ntf != NULL && fjage_msg_get_clazz(ntf) != NULL)
        {
            printf("Received a %s : [", fjage_msg_get_clazz(ntf));

            if (strcmp(fjage_msg_get_clazz(ntf), "org.arl.unet.DatagramNtf") == 0)
            {
                fjage_msg_get_byte_array(ntf, "data", data, NBYTES);
                for (int i = 0; i < 9; i++)
                {
                    printf("%d,", data[i]);
                }
                printf("]\n");
            }
            else
            {
                printf("]\n");
            }

            fjage_msg_destroy(ntf);
            if (ntf != NULL)
            {
                printf("NTF NOT NULL!\n");
            }
        }
        else
        {
            // fjage_msg_destroy(ntf);
            error("Error receiving data");
        }
    }

    // sleep(2);
    // Close the unet socket
    unetsocket_close(sock);
    printf("Reception Complete\n");
    return 0;
}

Thanks, Brodie Alexander

notthetup commented 1 year ago

Thanks @brodiealexander .

Do you think it's related to https://github.com/org-arl/fjage/pull/264 ?

If so we can try to fix both together. As per the discussion in the PR we might need to break the API so better to break it once together for both the APIs.

notthetup commented 1 year ago

Ooops. I misread the description. This seems to be a bug in unet.c. Specifically the monitor method.

brodiealexander commented 1 year ago

Sorry for the late reply. While not directly related, it is a similar pattern to the problem present in unet.c. Locally, I solved this issue successfully by returning a specific known value from monitor back to unetsocket_receive so that it knows an error has occurred and returns NULL instead of a null pointer. To me, this seems like it would fix the issue without breaking the way the API currently works, since users cannot directly access this data structure.

If you would like, I can try to clean up what I have a bit and submit it as a request for comment. In this case, I would also try to add some more rigorous test cases to validate that it is working as anticipated.

notthetup commented 1 year ago

Thanks @brodiealexander . I had a chance to take a quick look and it seems that usock->ntf needs to be unset at the beginning of the monitor. Do you think that would fix it?

At a higher level, I'm wondering why we spawn a thread to run monitor and wait for it to join back instead of doing the fjage_receive_any on the caller's thread. It'll also block similarly to what we do now. Perhaps @prasadtiru might have an idea why we decided to go with that approach.

notthetup commented 1 year ago

OK. I figured it out. We move the UnetStack C API from the older design : https://github.com/org-arl/unet-contrib/blob/5529cc67beafe69443f780154560da9be297c099/unetsocket/c/unet.c to the new design which is similar to a socket API.

The older design had an async API :

int modem_set_rx_callback(modem_t modem, modem_rxcb_t callback)
int modem_set_tx_callback(modem_t modem, modem_txcb_t callback)

That needed a background thread to catch messages and call the appropriate callbacks.

The move to the socket-style API was done in order to move the threading control to the user so that API consumer won't be locked into this style of threaded design. In the socket style API, the _receive and _request style APIs will block (until their timeouts) and the user will have to spin up threads to be able to send, while there's already a pending receive.

So we shouldn't have any need for any of the threading code anymore. I'll raise a PR to clean that up and also fix this error 👆🏾while doing that.

notthetup commented 1 year ago

@brodiealexander Could you check if these fixes on this branch fix the issues you're seeing? https://github.com/org-arl/unet-contrib/tree/csocket-fix

brodiealexander commented 1 year ago

Sorry for the delay, we just got back from spring break here. This branch solves our issue!