Closed reubenmiller closed 8 months ago
Disabling commands support for nested child devices was just a short term solution. Here is a proposal for a permanent solution, that re-enables command support for nested child devices as well:
etc/tedge/operations/c8y
directory for dynamic child device creation and just use it as a persistent store for supported operation files for all the registered entities. So, we don't introduce any new hierarchy support in this directory to accommodate nested child devices and services either. We keep the flat directory structure for all entities as the directory names(external IDs) are unique for every entity.te/device/<device-id>///cmd/c8y_Command
) with an empty payload, a symlink to the main supported operation definition file will be added to the corresponding device's operations directory.Disabling commands support for nested child devices was just a short term solution. Here is a proposal for a permanent solution, that re-enables command support for nested child devices as well:
Points 1 and 2 make sense.
- Introduce a new shared directory to keep the custom operation definition files, in the same format as it is today. When a device declares that it supports that custom operation using the corresponding capability message (e.g:
te/device/<device-id>///cmd/c8y_Command
) with an empty payload, a symlink to the main supported operation definition file will be added to the corresponding device's operations directory.
When are these files used? What's their purpose?
- If a device wants to have its own operation file definition instead of reusing a pre-configured one, it should first upload its operation definition file to the file-transfer repository and then provide that URL in the capability message that it sends, so that the mapper can move that uploaded file into that device's operations directory (This might be an optional requirement and can be deferred for later).
Why an URL in the capability message? Why not directly the expected payload?
Since https://github.com/thin-edge/thin-edge.io/pull/2466 has been merged, this ticket is unblocked.
- Introduce a new shared directory to keep the custom operation definition files, in the same format as it is today. When a device declares that it supports that custom operation using the corresponding capability message (e.g:
te/device/<device-id>///cmd/c8y_Command
) with an empty payload, a symlink to the main supported operation definition file will be added to the corresponding device's operations directory.When are these files used? What's their purpose?
The idea was to reuse a single custom command definition file across all the child devices that supports it.
- If a device wants to have its own operation file definition instead of reusing a pre-configured one, it should first upload its operation definition file to the file-transfer repository and then provide that URL in the capability message that it sends, so that the mapper can move that uploaded file into that device's operations directory (This might be an optional requirement and can be deferred for later).
Why an URL in the capability message? Why not directly the expected payload?
Since the operation file content is TOML, I thought it would be weird to include that in the cmd
payload which is generally JSON. So, uploading the TOML file content first, followed by the command message with this URL in the payload was a way to get around that.
That being said, both points 3 and 4 can be ignored as this was a proposal to enable custom operation support for child devices before the workflow feature was in-place. Since workflows is the way forward for cloud agnostic operation support, there's no point in introducing yet another cloud-specific mechanism for the same. So, we will focus only on points 1 and 2 for now.
Reaffirming the scope of this ticket:
etc/tedge/operations/c8y
directory won't be used for dynamic child device creation and just use it as a persistent store for supported operation files for all the registered entities. We keep the flat directory structure for all entities (immediate and nested entities) as the directory names(external IDs) are unique for every entity.etc/tedge/operations/c8y
, and registers the child devices for any extra child device directories found. This will prevent the directories of nested child devices being wrongly interpreted as immediate child devices.What is the source of truth? The retained MQTT messages on te/+/+/+/+
? The persisted version of the entity store? The /etc/tedge/operations/c8y
directory?
I dislike the idea of an asymmetric treatment for the child devices compared to the main one. This will be a continuous source of issues.
Also, beware that this PR, related to operation declaration might interact with this issue.
What is the source of truth? The retained MQTT messages on
te/+/+/+/+
? The persisted version of the entity store? The/etc/tedge/operations/c8y
directory?
Until now, there were two sources of truth. The operations directory and the retained messages on MQTT broker, both trying to sync with each other. With this work, we are eliminating the ops directory as a source of truth in a way that it just reflects the mapper's view of supported ops for each entity.
The persisted entity store is not a source of truth but just reflects the in-memory entity store. Even when the mapper restores its state from this file on startup, it is updated when the retained messages arrive from the broker, if those retained messages contains anything new.
I dislike the idea of an asymmetric treatment for the child devices compared to the main one. This will be a continuous source of issues.
It would have been nice to eliminate this API completely, including for the main device. But unfortunately, we're forced to keep it for now, as many customers are using this mechanism to add custom operations support to the main device. Once the workflow APIs are made available to the end-users, then they'll be able to migrate their existing custom operation mechanism to workflows and then we can get rid of this completely.
The reported issue is already tested in bootstrap.robot::Mapper restart does not alter device hierarchy
. The fix also removed the following features:
/etc/tedge/operations/c8y/<child-id>
does not create the same child device in the cloud anymore.te/device/<child-id>///cmd/<cmd-id>
topic which will result in the creation of the corresponding operation file in the child device ops directory and an updated operation list message with the <cmd-id>
sent to C8y.QA has thoroughly checked the bug and here are the results:
Describe the bug
When known command is registered on a nested child device (e.g. a child of a child) via MQTT, then the definition of the child device changes to an immediate child device, when the tedge-mapper-c8y service is restarted.
To Reproduce
Register an immediate child device
Register a nested child device (from the previous child device)
Register an operation on the nested child device
Verify in Cumulocity the following device hierarchy
Restart the tedge-mapper-c8y
Check the thin-edge.io registration message on the
te/+/+/+/+
topic.Notice that the '@parent' property is no longer part of the message for the nested child device.
Expected behavior
The tedge-mapper-c8y should use the registration information on the MQTT broker over any information in the
/etc/tedge/operations/c8y
folder when building the entity store. This will ensure that it does not override existing registration messages with incorrect information.Screenshots
Environment (please complete the following information):
Debian GNU/Linux 12 (bookworm)
Raspberry Pi 3 Model B Rev 1.2
Linux ginger 6.1.0-rpi4-rpi-v8 #1 SMP PREEMPT Debian 1:6.1.54-1+rpt2 (2023-10-05) aarch64 GNU/Linux
tedge 0.13.1~12+g5dcb430
Additional context
The root cause looks to be the mapper's interpretation of the operation files inside the
/etc/tedge/operations/c8y
folder which don't supported nested files.On the first registration, the
ginger:device:child-level2
folder is created under/etc/tedge/operations/c8y
and the entity store knows that theginger:device:child-level2
is a nested child devices ofginger:device:child-level1
, however when the tedge-mapper-c8y is restarted, it reads the operation list, and generates registration messages which override the previous registration message.Below is an example of the operations folder after the registration messages have been sent: