Closed brodiealexander closed 1 year ago
Thanks @brodiealexander .
Do you think it's related to https://github.com/org-arl/fjage/pull/264 ?
If so we can try to fix both together. As per the discussion in the PR we might need to break the API so better to break it once together for both the APIs.
Ooops. I misread the description. This seems to be a bug in unet.c
. Specifically the monitor
method.
Sorry for the late reply. While not directly related, it is a similar pattern to the problem present in unet.c
. Locally, I solved this issue successfully by returning a specific known value from monitor
back to unetsocket_receive
so that it knows an error has occurred and returns NULL
instead of a null pointer. To me, this seems like it would fix the issue without breaking the way the API currently works, since users cannot directly access this data structure.
If you would like, I can try to clean up what I have a bit and submit it as a request for comment. In this case, I would also try to add some more rigorous test cases to validate that it is working as anticipated.
Thanks @brodiealexander . I had a chance to take a quick look and it seems that usock->ntf
needs to be unset at the beginning of the monitor
. Do you think that would fix it?
At a higher level, I'm wondering why we spawn a thread to run monitor
and wait for it to join back instead of doing the fjage_receive_any
on the caller's thread. It'll also block similarly to what we do now. Perhaps @prasadtiru might have an idea why we decided to go with that approach.
OK. I figured it out. We move the UnetStack C API from the older design : https://github.com/org-arl/unet-contrib/blob/5529cc67beafe69443f780154560da9be297c099/unetsocket/c/unet.c to the new design which is similar to a socket API.
The older design had an async API :
int modem_set_rx_callback(modem_t modem, modem_rxcb_t callback)
int modem_set_tx_callback(modem_t modem, modem_txcb_t callback)
That needed a background thread to catch messages and call the appropriate callbacks.
The move to the socket-style API was done in order to move the threading control to the user so that API consumer won't be locked into this style of threaded design. In the socket style API, the _receive
and _request
style APIs will block (until their timeouts) and the user will have to spin up threads to be able to send
, while there's already a pending receive
.
So we shouldn't have any need for any of the threading code anymore. I'll raise a PR to clean that up and also fix this error 👆🏾while doing that.
@brodiealexander Could you check if these fixes on this branch fix the issues you're seeing? https://github.com/org-arl/unet-contrib/tree/csocket-fix
Sorry for the delay, we just got back from spring break here. This branch solves our issue!
So, I have been working on trying to provide an example based on
rxdata.c
which uses unetstocket_receive with a timeout on a loop. Referencing the documentation, it is clear that the intention is that we should free fjage messages after consuming them, and it is also clear fromunet.h
that unetsocket_receive() is intended to return NULL if no datagram is available.However, running the minimal working example below, it is clear that unetsocket_receive() does not necessarily return NULL on a timeout condition, neither is it set to NULL in the case of fjage_msg_destroy().
The result is that, in the first loop which happens after a message has already been received, the message variable 'ntf' is not set to NULL, and instead, the fjage method to retrieve the class returns an empty character array, and the program crashes due to a double free error.
In the screenshot, you can also see an issue related to #85 where the program receives an RxFrameNtf. I have disabled the printing of data from RxFrameNtf in the MWE, because under some conditions the misinterpretation of the data can cause a segfault.
I have some ideas on how to fix this inside of *monitor(), but wanted to ask for your input before attempting to make a PR or putting much serious work into it.
I have provided a minimal working example based on rxdata.c below:
Thanks, Brodie Alexander