project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.45k stars 1.99k forks source link

[BUG] Commissioning already in progress - not restarting #25708

Open caipiblack opened 1 year ago

caipiblack commented 1 year ago

Reproduction steps

  1. Use the matter stack to start a commissioning
  2. When the commissioning is at the step kFindOperational (operational discovery) call this function to stop the commissioning:
    CurrentCommissioner().StopPairing(idOfNodeBeingPaired);
  3. Start a new commissioning

At this point, until reboot of the commissioner we have Commissioning already in progress - not restarting when we try to commission a device.

This error appear because mCommissioningStage = kFindOperational and mCommissioningStage is never changed to kSecurePairing (default state) when we are in this case.

We don't know if it is a real issue, or if it's just our method to stop the commissioning that is not the correct method.

So, additional question:

Bug prevalence

Everytime

GitHub hash of the SDK that was being used

v1.0.0

Platform

other (linux)

Platform Version(s)

No response

Anything else?

This problem is just "hard" to reproduce because:

The unique way to reproduce is to modify chip-tool, or use your own commissioner..

Thats the unique difficulty to reproduce the problem

caipiblack commented 1 year ago

We have a similar problem when we stop the first pairing at the first step and we start another commissioning after

In these two cases, the application is not notified about the failure with commissioning callbacks for various reasons.

The reason of no notification in the case of this post is because mWaitingForPASE keep the old value (true), so when we arrive in this function, the function pairer->mCommissioner->OnSessionEstablishmentError(err); is never called.

void SetUpCodePairer::OnDeviceDiscoveredTimeoutCallback(System::Layer * layer, void * context)
{
    ChipLogError(Controller, "Discovery timed out");
    auto * pairer = static_cast<SetUpCodePairer *>(context);
    ChipLogProgress(Controller, "  -> StopConnectOverBle()");
    LogErrorOnFailure(pairer->StopConnectOverBle());
    ChipLogProgress(Controller, "  -> StopConnectOverIP()");
    LogErrorOnFailure(pairer->StopConnectOverIP());
    ChipLogProgress(Controller, "  -> StopConnectOverSoftAP()");
    LogErrorOnFailure(pairer->StopConnectOverSoftAP());

    ChipLogProgress(Controller, "  -> pairer->mWaitingForPASE = %d", pairer->mWaitingForPASE);
    ChipLogProgress(Controller, "  -> pairer->mDiscoveredParameters.empty()");

    if (!pairer->mWaitingForPASE && pairer->mDiscoveredParameters.empty())
    {
        ChipLogProgress(Controller, "  -> ready to notify..");
        // We're not waiting on any more PASE attempts, and we're not going to
        // discover anything at this point, so we should just notify our
        // listener.
        CHIP_ERROR err = pairer->mLastPASEError;
        if (err == CHIP_NO_ERROR)
        {
            err = CHIP_ERROR_TIMEOUT;
        }
        ChipLogProgress(Controller, "  -> OnSessionEstablishmentError()..");
        pairer->mCommissioner->OnSessionEstablishmentError(err);
        ChipLogProgress(Controller, "  -> Done.");
    }
}

So currently we can resume my entire issue as:

I think it's very important to be able to stop a commissioning in progress and make sure that next commissionings will not fail.

Now I remember i already made an issue about it...

https://github.com/project-chip/connectedhomeip/issues/13450

bzbarsky-apple commented 1 year ago

StopPairing is probably variously broken: nothing in the SDK exercises it right now.

I think you should probably file one issue per observed failure; they are likely to need different fixes. For this particular one, it sounds like "Calling StopPairing during operational discovery does not work correctly", right? Whatever is going on with SetUpCodePairer should be a separate issue.

bzbarsky-apple commented 1 year ago

For this specific issue, since OnDeviceConnectedFn and OnDeviceConnectionFailureFn are both no-ops if mDeviceBeingCommissioned is null, we probably want to reset mCommissioningStage when we reset mDeviceBeingCommissioned. But more generally, it seems to me like we should store the commissioning stage state (and probably other things) in some object associated with the specific commissioning, not on the controller itself...

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.