project-chip / connectedhomeip

Matter (formerly Project CHIP) creates more connections between more objects, simplifying development for manufacturers and increasing compatibility for consumers, guided by the Connectivity Standards Alliance.
https://buildwithmatter.com
Apache License 2.0
7.45k stars 1.99k forks source link

reporting::CheckedImpl::RetrieveClusterData crashes when attribute read fails #35306

Closed bzbarsky-apple closed 1 month ago

bzbarsky-apple commented 1 month ago

Reproduction steps

scripts/examples/gn_build_example.sh examples/energy-management-app/linux out/debug chip_config_network_layer_ble=false chip_use_data_model_interface='"check"'

Then run ./out/debug/chip-energy-management-app, commission it, and do:

chip-tool any read-by-id 0x94 0x00 ${NODE_ID} 1

The app crashes. This happens because that attribute read fails in both the "data model" and "ember" versions of read, but the "ember" version does the equivalent of AttributeReportBuilder::PrepareAttribute before it tries the part that fails, while the "data model" version does them in the opposite order. So while statusEmber != statusDm tests false, lengthWrittenEmber != reportBuilder.GetWriter()->GetLengthWritten() is true and we hit the fatal assert.

There's a comment there that says:

    // NOTE: RetrieveClusterData is responsible for encoding StatusIB errors in case of failures
    //       so we validate length written requirements for BOTH success and failure. 

but that's just not true. The StatusIB encoding is done by the caller of RetrieveClusterData (Engine::BuildSingleReportDataAttributeReportIBs) after rolling back the TLV in error cases. Except for IsOutOfSpaceEncodingResponse() situations.

So presumably either the logic in Read-Checked needs to be fixed to be looser in its assertions or the two RetrieveClusterData implementations need to be changed to produce equivalent output as this code expects.

Bug prevalence

Always

GitHub hash of the SDK that was being used

ebc4237ca928bb5a7eecfe423f67a8f5ef587b48

Platform

core

Platform Version(s)

No response

Anything else?

No response

bzbarsky-apple commented 1 month ago

Disabling this in https://github.com/project-chip/connectedhomeip/pull/35307 for now to work around SVE problems.

andy31415 commented 1 month ago

Reproduced via python script by:

    attr = await devCtrl.ReadAttribute(
        node_id, [Clusters.WaterHeaterManagement.Attributes.HeaterTypes]
    )

(wrote a test script in https://github.com/andy31415/chip-repl-tests/blob/main/energy_management_read.py). Looking into fixing this

andy31415 commented 1 month ago

It seems the logic in the code states that we do not compare data length for non-unit-tests because it is not reliable (time-dependent sizes may differ). However the logic for this seems broken as CONFIG_BUILD_FOR_HOST_UNIT_TEST seems to be enabled when building the energy management app. This seems odd.