Closed skohlbr closed 9 months ago
Have you verified that you can connect to /dev/naze_fc
with other programs, like screen or miniterm, when you're getting this message? That's the error message that also gets thrown if the USB is unplugged while rosflight_io
is running, so it'd be good to verify that this problem is specific to rosflight_io
and isn't a system level thing.
Ok so after your comment my first suspected culprit was the USB cable. Tried multiple ones, same issues.
Then I tried connecting the FC to another computer running the same S/W setup - worked as intended every single time.
Running screen
screen /dev/ttyUSB0 921600
gives me this kind of stuff running across the screen:
�������!�������������!������������� ������������ �����������L ������������L���������������,!��~��������!�������������!�������������!�����������$!������������`!��������<���A!������������` ������������ ��������� ����������� �����������, ��~���������m ������������!��<���������i!������������!������������ !��}����������!������������E!������������!!����� �� ����I ������������i!������������(!�����������e!�������������!����� �����D ������������! ������������� ���������!�� ��}��������������� �����-!�����������I!�������������!������������!�������a �����������I ������������ ������������� ��������� ������������!��>��������H!�����������h!���������� !��-� �}���O���L ������������H!����� �Y?���!�����������Ip��������\���9����������!�����������a!������y��}��`!�����������!�]���������e�������g���� ����������������%!������������!��x�������� ���������`
This (from manual inspection) looks the same on the machine where things are working and the one where I experience the described issue.
Going out on a limb here, but maybe there is some race conditions in mavrosflight that gets exposed on this relatively fast single-core performance (7th generation i5) CPU (?)
Ok, so I compiled in 'Debug' mode to get a better backtrace and now the problem is gone. Given 'Debug' runs slower, this is another reason to suspect that we're seeing a timing/race condition issue.
So using some console output in 'Release' I could confirm that the expection happens in this line: https://github.com/rosflight/rosflight/blob/master/rosflight/src/mavrosflight/mavlink_comm.cpp#L73 (not very surprising I guess :) ).
That is super interesting. Thanks for looking into this! I'm running 6th gen i7's on several computers and have never seen this, except in the condition @dpkoch described. How is your computer so fast!?! Haha!
I've seen a similar problem, but with different conditions, and solved it by resetting the port through linux with
stty -F /dev/ttyUSB0 sane
I wonder if that might help here too, though from what you've said it sounds like something different.
Interesting. So the error message is coming from inside the close()
function, which is gets called when a serial error is encountered elsewhere. This is what we see when the USB is unplugged, but something else must be causing an error for you.
So it's likely that there is a race condition that would be nice to fix, but for now it looks like we've only seen it after another error has already killed the serial communication anyway. So the more critical fix for you is most likely going to be figuring out why an error is occurring in the first place
When trying to connect to the flight controller running
I get the following:
I tried different USB ports and things started working as expected after plugging the cable into one of them. Afterwards I could also use other USB ports and everything worked as expected. Then I rebooted the system and I couldn't get it to work again, no matter how much USB port switching I'd do. Getting the exception seems to be the default now and I have no idea how to fix it currently. I noticed the same issue is described in https://github.com/byu-magicc/fcu_io/issues/42. I tried removing the drivers with rmmod, but this did not appear to help. I also peppered the startup code of the rosflight_io node with sleep calls (as the expection looks like some race conditions/threading issue), but this does not help either.
Here's a gdb backtrace (rosflight compiled in Release mode though, can try Debug later):
Any ideas/hints on how to fix things are appreciated.