project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.49k stars 2.01k forks source link

[BUG] Thermostat cluster SetpointRaiseLower sometimes fails when mode is Both #31214

Open Emill opened 10 months ago

Emill commented 10 months ago

Reproduction steps

  1. Start chip-all-clusters-app or thermostat-app.

  2. Pair chip-tool.

  3. Read the OccupiedHeatingSetpoint and OccupiedCoolingSetpoint attributes (optional):

    $ ./chip-tool thermostat read occupied-heating-setpoint 3 1
    ...
    Data = 2000
    ...
    $ ./chip-tool thermostat read occupied-cooling-setpoint 3 1
    ...
    Data = 2600
    ...

    These correspond to that the heating should be turned on when the temperature gets below 20.00 degrees Celsius and cooling should be turned on when the temperature gets above 26.00 degrees Celcius.

  4. Try to lower both setpoints by a big value, like -12.7 degrees:

    $ ./chip-tool thermostat setpoint-raise-lower 2 -127 3 1
    ...
    status = 0x01 (FAILURE)
    ...

    The server app outputs:

    [1704206104.716113][243400:243400] CHIP:DMG: Received command for Endpoint=1 Cluster=0x0000_0201 Command=0x0000_0000
    [1704206104.716131][243400:243400] CHIP:ZCL: Error: SetOccupiedCoolingSetpoint failed!
    [1704206104.716139][243400:243400] CHIP:DMG: Endpoint 1, Cluster 0x0000_0201 update version to b09ba7c5
    [1704206104.716144][243400:243400] CHIP:DMG: Endpoint=1 Cluster=0x0000_0201 Command=0x0000_0000 status 0x01 (FAILURE) (no additional context)

This failure was discovered when trying to execute the test TC-TSTAT-3.2 by first increasing Both values by 12.7 degrees and then decreasing Both values by 12.7 degrees against the test harness.

The failure occurs in thermostat-server.cpp:

    case OccupiedCoolingSetpoint::Id: {
        requested = static_cast<int16_t>(chip::Encoding::LittleEndian::Get16(value));
        if (!CoolSupported)
            return imcode::UnsupportedAttribute;
        if (requested < AbsMinCoolSetpointLimit || requested < MinCoolSetpointLimit || requested > AbsMaxCoolSetpointLimit ||
            requested > MaxCoolSetpointLimit)
            return imcode::InvalidValue;
        if (AutoSupported)
        {
            if (requested < OccupiedHeatingSetpoint + DeadBandTemp)
                return imcode::InvalidValue; // <------------------- This condition is triggered
        }
        return imcode::Success;
    }

and

    WriteCoolingSetpointStatus = OccupiedCoolingSetpoint::Set(aEndpointId, DesiredCoolingSetpoint);
    if (WriteCoolingSetpointStatus != EMBER_ZCL_STATUS_SUCCESS)
    {
        ChipLogError(Zcl, "Error: SetOccupiedCoolingSetpoint failed!");
    }
    WriteHeatingSetpointStatus = OccupiedHeatingSetpoint::Set(aEndpointId, DesiredHeatingSetpoint);
    if (WriteHeatingSetpointStatus != EMBER_ZCL_STATUS_SUCCESS)
    {
        ChipLogError(Zcl, "Error: SetOccupiedHeatingSetpoint failed!");
    }

The problem is that it first tries to write the cooling attribute followed by the heating attribute in a non-atomic fashion. When both values are to be lowered at the same time, the condition that fails (requested < OccupiedHeatingSetpoint + DeadBandTemp) for the cooling setpoint should not be tested until both values have been written. Alternatively, I think it will also work if the attributes are written in reverse order when we decrease the values.

Note that in order for the bug to trigger, initial values for the two attributes must be such that the condition will trigger. The default start-up values for the two example apps will make the bug trigger.

Also note that the second attribute will be written even if the first write failed, which is a bit strange.

Bug prevalence

Always

GitHub hash of the SDK that was being used

6ea3c34c119ec56eb425d4486d0b6a3f62742ef6

Platform

core

Platform Version(s)

No response

Anything else?

No response

tcarmelveilleux commented 10 months ago

@bzbarsky-apple Why is the spec tag added?

bzbarsky-apple commented 9 months ago

@tcarmelveilleux Because the behavior this bug causes is not spec-compliant.