whoenig / crazyflie_ros

ROS Driver for Bitcraze Crazyflie
MIT License
192 stars 205 forks source link

Crazyflies not communicating after startup: stuck inside crazyflie_server #29

Open araujokth opened 8 years ago

araujokth commented 8 years ago

Hi Wolfgang,

So I spent some time now investigating the problem I told you before that when I start the CFs via roslaunch, not all selected CFs start and the red led just keeps flashing. I am now trying with 12 CFs and 6 radios.

I went through some debugging and it appears that whenever one CF does not work, it gets suck inside the crazyflie_server in the step when you call "logBlockImu.reset(new LogBlock" or "logBlock2.reset(new LogBlock". I placed couts before and after that point and whenever one does not work, it does not go through one of those functions. I noticed that the ones that dont work go through the line ROS_INFO("Requesting Logging variables...") but never get to print ROS_INFO("Ready...") so I started from there.

Do you have any idea why this would be happening?

Thanks a lot for the help!

araujokth commented 8 years ago

Just separated the reset and construction of the logblock as follows:

LogBlock* tmp_logblock_0 = new LogBlock( &m_cf,{ {"acc", "x"}, {"acc", "y"}, {"acc", "z"}, {"gyro", "x"}, {"gyro", "y"}, {"gyro", "z"}, }, cb);

logBlockImu.reset(tmp_logblock_0);

and the application gets stuck inside the first call, as it never reaches the reset line.

whoenig commented 8 years ago

Just as clarification: With the latest fix only the CFs of question do not work, but the others do?

The function you found is blocking, i.e. it tries to communicate with the Crazyflie until that portion is completed. That means if there is no working communication, it will get stuck there.

Did you increase the number of radios in https://github.com/whoenig/crazyflie_ros/blob/master/crazyflie_cpp/src/Crazyflie.cpp#L13? That is kind of a hack and unfortunately there doesn't seem to be a sanity check at runtime for radioIds >= 4.

araujokth commented 8 years ago

Yes, only those do not work, the others work fine.

Would there any way to go out of the loop in a clean way and restart? like some sort of reboot? :)

Yes, I had set that value to 6

whoenig commented 8 years ago

So the blocking loop is this one: https://github.com/whoenig/crazyflie_ros/blob/master/crazyflie_cpp/include/crazyflie_cpp/Crazyflie.h#L312-L315?

This loop is self-healing in the sense that it keeps sending the request until it receives positive feedback. However, I can double check if there are any corner cases in the firmware, i.e. cases where it does not send the response I expect.

Is the LED on the Crazyradio constantly red though?

araujokth commented 8 years ago

Yes, thats where it is getting stuck. Sorry for not updating the issue, since I found out afterwards that it was in that loop that it got stuck. Tried to add a sleeping time in between packet transmissions, but it did not seem to work. Do you think that the problem may be at the crazyflie side that it just gets in a weird mode? Because its impossible that the packets are not arriving there since the rest of the CFs are communicating well. This does not happen with a specific CF or specific radio, but its really random.

yes, I think that it is only red, but could not distinguish if there were some greens or if the red was just flashing.

I am using the latest crazyflie firmwares from bitcraze.

whoenig commented 8 years ago

So it would be possible that the firmware sends back an error code. Could you add the following: above https://github.com/whoenig/crazyflie_ros/blob/master/crazyflie_cpp/src/Crazyflie.cpp#L315

if (r->command == 0 && r->result != 0) {
      std::cout << "LogBlockCreated Error " << r->result << std::endl;
 }

My hope is that the failing CF still gets a result, but the firmware reports some error - which we should handle more gracefully on the ROS side.

whoenig commented 8 years ago

FYI, that is what the firmware is doing: https://github.com/bitcraze/crazyflie-firmware/blob/master/modules/src/log.c#L270. Hence, there are a number of error cases possible, however we should always get a response to the request.

araujokth commented 8 years ago

Great! I will implement that tmr and let you know the result! Thanks a lot!

araujokth commented 8 years ago

Hi Wolfgang, so I tried what you mentioned and I get the output error r->result to be a square character with 00 11 in two rows. I then converted it to double and I get that the value is 17.

If I uncomment this line https://github.com/whoenig/crazyflie_ros/blob/master/crazyflie_cpp/src/Crazyflie.cpp#L327, I get LogControl: 0 errno: 17

I am trying to look now at the firmware side to side what the 17 means. Would a complete "restart" at the ROS side for all of the CFs that get this error be needed or you think there is some new message that could be sent afterwards to fix it?

Well, I just tried a "not so graceful" approach that if I get the error 17, I just call reboot and it appears to work so far... but perhaps not the best way to fix it right? It works if I sleep between each sent packet here https://github.com/whoenig/crazyflie_ros/blob/master/crazyflie_cpp/include/crazyflie_cpp/Crazyflie.h#L312-L315 for a least 2 seconds.

whoenig commented 8 years ago

17 means EEXIST (http://www.virtsync.com/c-error-codes-include-errno), so a previous acknowledge for the log creation got lost, but the block got created anyways. This seems to be a case which can happen frequently - weird that I didn't run into that. It is triggered here: https://github.com/bitcraze/crazyflie-firmware/blob/master/modules/src/log.c#L305

The easiest fix for now is to ignore it, i.e. in https://github.com/whoenig/crazyflie_ros/blob/master/crazyflie_cpp/src/Crazyflie.cpp#L315 change if (r->command == 0 && r->result == 0) { to if (r->command == 0 && (r->result == 0 || r->result == 17)) {. That is of course not a proper fix, but it should help you going. I'll leave this case open to address it properly in the future.

What you describe doesn't really fix the issue, you most likely just get lucky during your testing:-)

araujokth commented 8 years ago

oh, thats great to know! I will implement that tomorrow. I did not think that ignoring could solve the problem. Very lucky I have been then since it always worked since I made the change haha

Its very strange you never got into this problem! Well, just tested and it works as you expected! thanks again!!

Thanks again!

LZMHIT commented 6 years ago

@whoenig I launched "roslaunch crazyflie_demo hover_vrpn.launch uri:=radio://0/80/250K",but I get "terminate called after throwing an instance of 'std::runtime_error' what(): timeout" in my terminal,I couldn't communicate with the crazyflie, what is th matter

whoenig commented 6 years ago

Please open a new issue - this doesn't seem to be related to this open issue. Make sure your uri is correct - the timeout error occurs if no communication could be established within a certain time frame.

vigyansadhu commented 5 years ago

@LZMHIT I had the same issue of 'std::runtime_error'. It turns out that all you have to do is restart the Crazyflie and run the launch file. The issue occurred for me whenever the Crazyflie was connected using cfclient.