Can't arm immediately after instantiating Action plugin

shakthi-prashanth-m commented 6 years ago

I got confused about why does my application fails to arm many times. I reported this to @julianoes recently. Me & @Rjasuja were debugging and concluded that Telemetry::health_all_ok() should be called more than once to get proper health status (partly correct).

However, this was not the root cause of the issue. Consider below pieces of code fragments.

Below code fails to arm:

auto action = std::make_shared<Action>(&device);

auto arm_result = action->arm();
action_error_exit(arm_result, "Arming failed: ");

But this does arm:

auto action = std::make_shared<Action>(&device);

sleep_for(seconds(1)); // wait for 1 second to receive Landing state of the device.

auto arm_result = action->arm();
action_error_exit(arm_result, "Arming failed: ");

We better document this to user. Or Did I misunderstand something ?

shakthi-prashanth-m commented 6 years ago

In former case (above), Landing state was NOT received, but received in latter case.

julianoes commented 6 years ago

@shakthi-prashanth-m hm, good observation, that's not optimal.

julianoes commented 6 years ago

Possible solutions we talked about on the call:

Add Action::is_armable().
Move landed/in-air state to core and get it in action plugin.
Cache last message received of each type and let action get the it to determine the state.
Do nothing, and just be aware that a plugin needs to be instantiated as early as possible.

julianoes commented 6 years ago

Related to #65.

JonasVautherin commented 6 years ago

EDIT: I changed my mind (see my comment below :P)

I would vote for @julianoes 4th choice (do nothing) or another one: add a "state" callback in the Action plugin.

My reasoning is the following:

is_armable() doesn't bring much, as it is another way to receive a negative ActionResult :-). When I want to arm, instead of calling is_armable(), I can always do arm() instead.
Keeping state there makes it more complex
Not sure I understand, but I think it also adds some caching in the action plugin, which means state and complexity

One use of is_armable that I can imagine on the user side is e.g. to enable a button in the UI. But instead of doing a while (!is_armable()) in my code, as a user, I'd rather have a "state" callback in the action plugin (e.g. registerIsArmable()).

shakthi-prashanth-m commented 6 years ago

Add Action::is_armable().

IMO, we can find a way to avoid adding an API if possible. If none makes sense, its last option.

Move landed/in-air state to core and get it in action plugin.

We discussed that its better to remain in Action plugin.

Cache last message received of each type and let action get the it to determine the state.

This is interesting. Need to check.

Do nothing, and just be aware that a plugin needs to be instantiated as early as possible.

Me @JonasVautherin had chat about it. I think, we can also add it in API description of arm that, when it returns failure, retry after a second (like we do in case of Telemetry::health_all_ok().)

I also think, we should add a note even in Telemetry::health_all_ok() about retry mechanism.

JonasVautherin commented 6 years ago

Actually, @julianoes just convinced me the opposite. Maybe is_armable() is good.

From my experience, before I started using streams (RxJava), I was really wanting to have blocking calls instead of callbacks. Then I moved to streams, and that was the opposite. Given that right now, we don't have streams in the core libs, I would vote for is_armable().

Sorry for the confusion :D.

hamishwillee commented 6 years ago

The way I see it is captured in the guide here.

// Wait until health is OK and vehicle is ready to arm
while (telemetry->health_all_ok() != true) {
    std::cout << "Vehicle not ready to arm ..." << std::endl;
    std::this_thread::sleep_for(std::chrono::seconds(1));
}

So basically you want to wait on readiness for arming, and you want to be able to dig into why you can't arm. The test there is telemetry->health_all_ok(). If that does not work, then either it needs to be:

extended so it won't return OK until all required information has been retrieved OR
we need is_armable() to do the same job. I tend to think that health_all_ok() should not return until all information that is needed to assess that health is retrieved - ie landing state might be part of that info. If this cannot be in "health all OK" then I tend to think that API should be replaced by is_armable - it isn't useful if it can't be used for arm checking.

The other way to do this is of course to just try to arm and repeat until you get success. I don't like that as much because it is extra messages that need to be sent.

Either way, I'd prefer NOT to do nothing/fix this in docs.

JonasVautherin commented 6 years ago

I vote for is_armable().

hamishwillee commented 6 years ago

I vote for is_armable()

OK, but can/should we correspondingly remove health_all_ok() as they effectively serve the same purpose.

shakthi-prashanth-m commented 6 years ago

@hamishwillee

The test there is telemetry->health_all_ok(). If that does not work, then either it needs to be:

It did the work by setting in-air status of the device. But Action plugin will re-do the same work as it can't get in-air status from Telemetry plugin!

As we all know, Telemetry health check is done by all the examples (see below); so will be in all DroneCore applications ? telemetry_usage

I assume every application needs to perform health check before executing any business logic. If so, then we better move it to the core. Telemetry plugin may serve providing Telemetry info to applications. By moving it to the core, landing state issue will be resolved as well. Makes sense ?

hamishwillee commented 6 years ago

If it works as an is_armable check I don't really care where it lives because from a 3rd party app point of view you need Telemetry in most apps. I'm not sure you need to be able to do it before "arbitrary" business logic, but you do need it in this case.

I'm not completely convinced by the logic that it needs to be in core - unless you want to do that check before you allow an arm() command to be send (ie as part of our internal implementation).

shakthi-prashanth-m commented 6 years ago

How about blocking Action::arm() call for 1 sec (only if landing state is unknown) to receive landing status ? There's no harm in blocking for a definite period of 1 sec even in async version; as this is a criteria for arming. If so, we don't need is_armable() check, by doing health check as usual.

julianoes commented 6 years ago

How about Action::arm() call will be blocked for 1 sec

That's also an interesting idea. However, we then need to catch the case where we never receive the system state. So essentially we need to timeout.

shakthi-prashanth-m commented 6 years ago

However, we then need to catch the case where we never receive the system state. So essentially we need to timeout.

Action::Result ActionImpl::arm() const
{
    Action::Result ret = arming_allowed();           // check 1st time
    if (ret != Action::Result::SUCCESS) {           // not allowed
        sleep_for(seconds(1));                     // wait for 1 sec only
        ret = arming_allowed();                   // check again
        if (ret != Action::Result::SUCCESS)
            return ret;                         // return error, if not allowed even now!
    }
    // Go ahead to arm
    ...
}

So, when arm() fails, we localized that issue.

JonasVautherin commented 6 years ago

I believe that sleeping one second in arm() solves a problem we don't have. When you arm(), there are many situations where you might receive a negative ActionResult (that's the point of the ActionResult in the first place, right?). So as a user, anyway you want to check the result and react accordingly.

What we don't have right now is the ability for the user to know that the drone is armable without calling arm() (because maybe I want to show an "arm" button on my UI, but I want it to be disabled when the drone is not ready. And actually in this case it is more a is_initialized() or is_ready() for the plugin, because we are only talking about the initialization time.

Telling people that they need the telemetry plugin in order to use the action plugin is a weird dependency to me. And again, we don't need the telemetry to arm() and receive the negative ActionResult.

Receiving an ActionResult that is not SUCCESS is not a bug, so there is no need for us to solve it :D.

julianoes commented 6 years ago

@JonasVautherin I agree.

shakthi-prashanth-m commented 6 years ago

Telling people that they need the telemetry plugin in order to use the action plugin is a weird dependency to me.

True. In that case, we should be able to replace Telemetry::health_all_ok() by Action::is_armable().

Telemetry plugin is useful to get some info that can be shown in app. For eg. QGC shows Vehicle mode and Arming status: screenshot from 2018-02-28 14-59-03

Sure, we can add is_armable(). But consider below scenario: Suppose that an app is just willing to use Action plugin (no Telemetry, as it is not required for it). If there's some calibration issue (IMU, say) and we're trying to arm it, is_armable() should let us know. So, thats why I think this should go in core. Is there a possibility of such issue ?

julianoes commented 6 years ago

I think it should be duplicated, that's a tradeoff when preventing dependencies.

JonasVautherin commented 6 years ago

But right now, if there is some calibration issue, arm() will return an error, right? Isn't that enough?

shakthi-prashanth-m commented 6 years ago

Yes. I tested by running SITL with no position lock (just to simulate). I used Action plugin alone.

while (!action->is_armable()) {
   std::cout << "Waiting for to arm..." << std::endl;
   sleep_for(seconds(1));
}
// Arm vehicle
std::cout << "Arming..." << std::endl;
auto arm_result = action->arm();
handle_action_err_exit(arm_result, "Arming failed: ");
std::cout << "Armed." << std::endl;

Below is the console log:

Discovered device with UUID: 4294967298
Waiting for to arm...
Arming...
[03:50:24|Warn ] command temporarily rejected (176). (mavlink_commands.cpp:143)
Arming failed: Command denied

I think it should be fine. If user want to understand why was command denied, he may install Telemetry plugin and perform health check, etc. Sounds good ?

JonasVautherin commented 6 years ago

Sounds good to me.

"The vehicle might reject commands while it is initializing, and if you want to know when it is ready, you get this information from the telemetry."

hamishwillee commented 6 years ago

@shakthi-prashanth-m @JonasVautherin

I think it should be fine. If user want to understand why was command denied, he may install Telemetry plugin and perform health check, etc. Sounds good ?

So yes to the broad point that is_armable() in the Action plugin is just binary information, and that getting detailed information on why something is not arming requires loading Telemetry module.

the is_armable() must be 100% reliable. If this passes, the only reason arming should fail would be a subsequent change in state - ie a failsafe engaged between is_armable being true, and arm being received.
It must be possible to find out all blocking reasons, including land state or non-initialisation - is this the case?

So this code looks good ...

while (!action->is_armable()) {
   std::cout << "Waiting for to arm..." << std::endl;
   //Here you could check Telemetry to find out why things are failing
   sleep_for(seconds(1));
}
// Arm vehicle
std::cout << "Arming..." << std::endl;
auto arm_result = action->arm();
handle_action_err_exit(arm_result, "Arming failed: "); //Could also check to find out why
std::cout << "Armed." << std::endl;

But generally you should not get the following log (a command denied) because is_armable has returned true.

Discovered device with UUID: 4294967298
Waiting for to arm...
Arming...
[03:50:24|Warn ] command temporarily rejected (176). (mavlink_commands.cpp:143)
Arming failed: Command denied

So I wouldn't have:

The vehicle might reject commands while it is initializing, and if you want to know when it is ready, you get this information from the telemetry."

as it implies that you have to check on telemetry to know when things are ready, but in fact you only check on telemetry to find out *why it is not ready. Perhaps:

The vehicle might reject commands while it is initializing. You should check the is_armable() state to determine when it is ready to arm. If you wish to understand what condition the vehicle is waiting on then you can get this information from the telemetry."

shakthi-prashanth-m commented 6 years ago

the is_armable() must be 100% reliable. If this passes, the only reason arming should fail would be a subsequent change in state - ie a failsafe engaged between is_armable being true, and arm being received.

So, you mean is_armable() should also perform health check. I think this is what @julianoes meant when he said:

I think it should be duplicated, that's a tradeoff when preventing dependencies.

In that case, we could have a private method:

void Action::health_all_ok_async(health_all_ok_callback_t callback);

that performs health check during Action::init(); and its result will reflect in is_armable() value. In other words, is_armable() will be true only when health_all_ok is true. I think this should solve the issue.

hamishwillee commented 6 years ago

@shakthi-prashanth-m

So, you mean is_armable() should also perform health check.

More precisely, I mean that is_armable() should always and completely indicate that the vehicle is ready to arm. I also mean that it should be possible to work out exactly why that it is not arming.

That should mean that all the checks that are done by the heath check are also done by is_armable() - but there would be more checks if there are other conditions that can give a true health check when arming is not ready.

I think this is what @julianoes meant when he said:

I think it should be duplicated, that's a tradeoff when preventing dependencies.

What he meant is that you should reimplement the health checking functionality in Action (as opposed to having it in the core (say) or calling the Telemetry version.

Yes, this could (and probably should) be a private method. I think the method would be run inside is_armable() (not as part of init - after all, what if health isn't OK at init but you run arm a few minutes later?). There might be more than the health check run in is_armable() - whatever it takes to make sure the vehicle really is ready to arm.

Does that make sense?

shakthi-prashanth-m commented 6 years ago

Sounds correct to me. We can perform health check when is_armable is invoked. So it will take some time for is_armable to return true, as it has to make sure vehicle is ready for arming. We should let user know about the repeated invocation of this API. Like you said, user may load telemetry plugin to find out health info to know why it failed. In that case, there will be double health check up :). But that's left to the user. Should be fine.

hamishwillee commented 6 years ago

We can perform health check when is_armable is invoked. So it will take some time for is_armable to return true, as it has to make sure vehicle is ready for arming.

Well, no more time than using the health check method for the same purpose - unless there are additional checks to be done.

We should let user know about the repeated invocation of this API.

I don't really get what you are saying here. The user just calls is_armable() until it returns true. Then they can call arm. Or if we have an async method they call it once, then get callback once is_armable is true. Under the hood we might call health checks multiple times, but the user doesn't need to know that.

Like you said, user may load telemetry plugin to find out health info to know why it failed. In that case, there will be double health check up :). But that's left to the user. Should be fine.

Yes.

shakthi-prashanth-m commented 6 years ago

I don't really get what you are saying here. The user just calls is_armable() until it returns true. Then they can call arm. Or if we have an async method they call it once, then get callback once is_armable is true. Under the hood we might call health checks multiple times, but the user doesn't need to know that.

I meant that user shouldn't think its failure and exit when is_armable returns false. So it's better to add a note in API description, though we show in examples. I said this because in transition to VTOL example: health check is done only once and does exit after it fails. So I think same is applicable for Telemetry::health_all_ok() as well. Even in takeoff land example too 😊

hamishwillee commented 6 years ago

I meant that user shouldn't think its failure and exit when is_armable returns false. So it's better to add a note in API description, though we show in examples.

OK - understood :-). That is just our implementation in the examples.

julianoes commented 6 years ago

It must be possible to find out all blocking reasons, including land state or non-initialisation - is this the case?

The current checks should cover some of it but the commander state machine is pretty complex, so it will require more work to know about all cases, unfortunately.

hamishwillee commented 6 years ago

It must be possible to find out all blocking reasons, including land state or non-initialisation - is this the case?

The current checks should cover some of it but the commander state machine is pretty complex, so it will require more work to know about all cases, unfortunately.

@julianoes Curses. But we can have a reliable is_armable() check at least, right?

mavlink / MAVSDK

Can't arm immediately after instantiating Action plugin #291