nasa / fprime

F´ - A flight software and embedded systems framework
https://fprime.jpl.nasa.gov
Apache License 2.0
10.05k stars 1.3k forks source link

Fix warnings in tcp/udp UT #2744

Open JohanBertrand opened 4 months ago

JohanBertrand commented 4 months ago
Related Issue(s) #2706
Has Unit Tests (y/n)
Documentation Included (y/n)

Change Description

I am not sure we want to have those changes in the codebase. That could be used as a baseline for a better update.

Most of the warnings in the UT with TCP/UDP module should be fixed with those changes. A mutex is making sure that the receive/open/startup functions are called only if the stop/close function are not going to be called in an other thread.

I added the changes in the unit tests instead of the class itself (in open/close), because I did not find a way to make it work properly.

I also needed to make recv to stop iterating if errno == EAGAIN to avoid a timeout. I'm not sure if that's the proper way to fix this issue.

The only warning still happening for me, after 10.000 tests, is the following (happened only once):

[ RUN      ] Reconnect.ReceiveThreadReconnect
Listening for single client at 127.0.0.1:32769
Connected to 127.0.0.1:32769 as a tcp client
Accepted client at 127.0.0.1:32769
Connected to 127.0.0.1:32769 as a tcp client
Accepted client at 127.0.0.1:32769
Connected to 127.0.0.1:32769 as a tcp client
Accepted client at 127.0.0.1:32769
Accepted client at 127.0.0.1:32769
[WARNING] Failed to recv from port with status -8 and errno 0

I feel like the warnings in the test should make the UT to fail. However, I am not sure how to catch the warning streamed through Fw::Logger::logMsg.

Also fix an error in StringUtils.cpp when compiled on ubuntu20.04 with a FW_ASSERT:

/home/fprime/Fw/Types/StringUtils.cpp:51:40: error: comparison of integer expressions of different signedness: ‘FwSizeType’ {aka ‘long unsigned int’} and ‘long int’ [-Werror=sign-compare]
   51 |     FW_ASSERT((source_size - sub_size) <= std::numeric_limits<FwSignedSizeType>::max());
JohanBertrand commented 3 months ago

@LeStarch is there something we can do in the UT to check that no warnings are logged using Fw::Logger::logMsg, and cause the UT to fail if it contains warnings?

During my tests, I used that warning messages to realize that something was wrong.

I also used a timeout to make sure that the tests were not taking too long. It was especially useful for checking that the TcpServer tests were working properly. The test could take a few seconds on failure, where it's supposed to be less than 500ms in nominal conditions on my hardware. However, I don't think that's something that can/should be implemented, since it's hardware dependent.

JohanBertrand commented 3 months ago

@LeStarch @thomas-bc Let me know if there is something else to change in this PR

JohanBertrand commented 2 months ago

@LeStarch @thomas-bc I don't really understand where the problem is coming from on the Mac OS side and I can not reproduce the problem on my setup. Is it possible for you to have a look at this?

LeStarch commented 2 months ago

@JohanBertrand yes we can!

JohanBertrand commented 1 month ago

@LeStarch I fixed the merge conflict and merged devel, it should be good to be merged once the issue on Mac OS is resolved

LeStarch commented 1 week ago

We have been working to get https://github.com/nasa/fprime/pull/2683 merged and are about to do so. In that PR we substantially rearchitect the mutex handling. I will note which changes need to be kept and which are likely superseded.

JohanBertrand commented 4 days ago

@LeStarch The branch has been updated with devel.

I can see that the messages logged through Fw::Logger::log do not seem to appear in the test log anymore. Is there a way to print it again?

I used it in this PR for checking that no [WARNING] messages were logged in the test log.

Edit: After manually enabling print in the UT, I am still getting a lot of [WARNING] Failed to recv from port with status -8 and errno 0 messages in TCP client, TCP server and UDP UTs.

Out of curiosity, did you @csmith608 investigate this in your previous PR?

csmith608 commented 18 hours ago

Hey @JohanBertrand, these unit tests fail on Mac due to the first recv after the send sometimes getting an EAGAIN rather than the actual data. This issue does not appear on Linux. I added a retry mechanism and a clearer status enum for the EAGAIN situation as the recv man page still classifies this as an error rather than a success.