Closed reubenmiller closed 1 year ago
I have some questions.
Receive the firmware SmartREST message: 515,ubuntu core,20.04.3,http://test.com, where it follows the schema of 515,{firmware_name},{firmware_version},{firmware_url}
The child device id seems to be missing. Isn't it?
c8y/s/us with a payload of 115,{firmware_name},{firmware_version},{firmware_url}.
Again no child device id. Is this id required by Cumulocity? Is the firmware operation independent of the child device?
The timeout of the firmware operation should be configurable via the c8y_Firmware operation file. The default operation timeout should be 6 hours (as firmware operation generally can take longer to apply).
Is this 6 hours between "executing" and "success/failure" or is this 6 hours to get a first reaction from the child device?
Firmware artifact caching.
When is the file deleted? Do we try to avoid to download twice a firmware to be installed on 2 devices in a row?
I have some questions.
Receive the firmware SmartREST message: 515,ubuntu core,20.04.3,http://test.com, where it follows the schema of 515,{firmware_name},{firmware_version},{firmware_url}
The child device id seems to be missing. Isn't it?
Yes, you're right, I forgot to put the child topic as
c8y/s/us/<child-id>
. I have corrected this in the original ticket description.c8y/s/us with a payload of 115,{firmware_name},{firmware_version},{firmware_url}.
Again no child device id. Is this id required by Cumulocity? Is the firmware operation independent of the child device?
Yes, again that was a mistake on my part. I have fixed it to include the proper second field which is the target external id of the device, e.g.
device-id
orchild-id
.The timeout of the firmware operation should be configurable via the c8y_Firmware operation file. The default operation timeout should be 6 hours (as firmware operation generally can take longer to apply).
Is this 6 hours between "executing" and "success/failure" or is this 6 hours to get a first reaction from the child device?
For the first implementation I would say that it is the timeout between any child device communication, whether it be in between the initial 'set-to-executing' message, to the successful/failure message. The key would be to make the timeout configurable (though I am also open to having two different timeouts if necessary).
Firmware artifact caching.
When is the file deleted? For first implementation I would not worry about deleting the artifact as a simple cronjob could be written to delete them after they are x days old. We would need a more robust artifact retention concept before we could implement a wider feature.
Do we try to avoid to download twice a firmware to be installed on 2 devices in a row?
Good question, yes we should avoid the same component downloading the same artifact from two child devices (since the cache is only checked for a completed download). Though we could post-pone more advanced download/caching topics for a second phase (e.g. limit number of parallel clients downloading artifacts, automatic cache eviction etc.)
After a discussion with @rina23q, the following proposal was made to control which files are exposed by the http server.
/var/tedge/cache/${cache_key}
/var/tedge/file-transfer/${CHILD_ID}/firmware_update/${cache_key}
cache_key
is the unique checksum of the file (e.g. sha256 of the url string)
File cache structure
The following files are NOT directly accessible by the http server, they are only exposed via symlinks.
/var/tedge/cache/
|_ aaaaaaa
|_ bbbbbbb
|_ ccccccc
Example file-transfer symlink
/var/tedge/file-transfer/child01/firmware_update/aaaaaaa (symlink to /var/tedge/cache/aaaaaaa)
/var/tedge/file-transfer/child02/firmware_update/aaaaaaa (symlink to /var/tedge/cache/aaaaaaa)
ln -s /var/tedge/cache/aaaaaaa /var/tedge/file-transfer/child01/firmware_update/aaaaaaa
Try it out using a manual symlink
ln -s /var/tedge/cache/aaaaaaa /var/tedge/file-transfer/child01/firmware_update/aaaaaaa
And this approach should work :+1: I did a quick try. Could get the content of the original file via GET request.
Try it out using a manual symlink
ln -s /var/tedge/cache/aaaaaaa /var/tedge/file-transfer/child01/firmware_update/aaaaaaa
And this approach should work 👍 I did a quick try. Could get the content of the original file via GET request.
I've updated the ticket description to reflect this approach
Is your feature request related to a problem? Please describe.
There is no mechanism currently available to support firmware update operations on child devices.
Currently a custom operation handler can be written for the main device, however the child devices do not have such support.
Describe the solution you'd like
Disclaimer: The implementation focuses on support for the Cumulocity IoT
c8y_Firmware
operation for child devices only!The support for the
c8y_Firmware
operation for child devices should follow a very similar flow as the configuration management for child devices, with the addition of sending one extra smart rest message via MQTT before transitioning the operation toSUCCESSFUL
.For initial implementation the firmware operation handler should be implemented as a new service called
c8y-firmware-operation
. This is subject to change in the future after the refactoring ticket is complete.The child device feature should be activated by:
c8y_Firmware
file under the child devices supported operations, e.g./etc/tedge/operations/c8y/{child_name}/c8y_Firmware
The flow is:
Receive the firmware SmartREST message:
515,myChild-1,ubuntu core,20.04.3,http://test.com
, where it follows the schema of515,{device-id|child-id},{firmware_name},{firmware_version},{firmware_url}
Download the firmware url, and make it available via the local http server (the same used by the
c8y-configuration-plugin
). The file should be stored in a local file cache location (outside of direct view of the http server, e.g./var/tedge/cache
). A symlink should be created under the child device structure (under/var/tedge/file-transfer
) which links it to the file-cache location.Publish a MQTT message on the topic
tedge/${CHILD_SN}/commands/req/firmware_update
, with the following payload:Note
${file-cache-key}
is the sha256 checksum of the.url
string (as received from the server), e.g.http://test.com
. This is used to uniquely identify if the file exists in the local file cache or not.{request_id}
is a unique identifier for the operation. The request id should be used in all corresponding replies from the child device connectorThe local firmware url should be the url where it can be downloaded via the local http server. The local http server url should include the child id in it, however it should just use symlinks to link the downloaded file to the applicable child device. If the file is to be applied to multiple child devices, then the there will be 1 symlink per child device, and all the symlinks will be referencing the same file (which is stored in the file cache area, outside of the http server view). Using this structure makes it easier to write ACL rules for the HTTP server based on child id, as it can be purely URL path based, plus it also ensures that the same file is available to multiple child devices without copies the same file (reducing the disk space usage required)
The child device connector should send the following optional message to indicate that the firmware operation is being processed by the child device. The following MQTT message should follow the schema of:
Topic
Payload
On receiving this message, tedge should send an MQTT message to the
c8y/s/us/<child-id>
topic with the payload501,c8y_Firmware
to indicate that the operation is being processed.The child device connector then sends either a success or failed message to indicate if the firmware operation was successful or not.
If the child device connector only sends a
successful/failed
operation before tedge has received anexecuting
message, then tedge should send the501,c8y_Firmware
MQTT message automatically. This is the same handling as what is already implemented in the child configuration management. The idea is to make it easier on the child connector implementation by reducing the amount of mandatory messages whilst still allowing finer grain control.If Successful
When successful, the device connector should send the following message to the
tedge/${CHILD_SN}/commands/res/firmware_update
topic.If Failed
When failed, the device connector should send the following message to the
tedge/${CHILD_SN}/commands/res/firmware_update
topic.Note
If the child device connector does not send back either
successful
,failed
orexecuting
(e.g. an invalid status) then the operation should be treated asfailed
. Though the comparison of thestatus
fragment should be case insensitive to make the tedge handling more developer friendly.Depending on the response received by the following step (from the child device connector), one of the following steps should be executed by firmware plugin.
If the operation was successful, then the following smart rest message should be sent to indicated to Cumulocity IoT that the firmware name/version/url has now changed on the device
c8y/s/us/<child-id>
with a payload of115,{firmware_name},{firmware_version},{firmware_url}
. Note thefirmware_url
is the original url received from Cumulocity IoT, not the local url!Then transition the operation to successful by sending a MQTT message to the
c8y/s/us/<child-id>
topic with the payload503,c8y_Firmware
.If the operation was not successful, then transition the operation to failed without any additional MQTT messages.
Send a message to the
c8y/s/us/<child-id>
topic with the payload502,c8y_Firmware,{failure_reason}
. The failure reason should be provided by the child device connector from the previous step under thereason
property. If no reason is given then a default reason should be used, e.g.unknown error. The child device connector did not specify the error reason
, or something to that effect.Example payload
Configuration
c8y_Firmware
operation file. The default operation timeout is 1 hour (as firmware operation generally can take longer to apply). If the device connector does not respond by either sending aexecuting
orsuccessful/failed
message back, then the operation should be treated as failed, and the failure reason should bechild device connector did not respond within the timeout interval of 3600 seconds
(or something to that effect)The default timeout for the firmware update can be changed via the
tedge.toml
fileAdditional constraints
PENDING
state, until the previous operation has been fully processed (either successful or failed). Afterwards the newer firmware operation should then be processed.Describe alternatives you've considered
No other alternative solution was considered, as the child device firmware operation support follows the same design/api as the management support for child devices.
Additional context
The correct sequence of the Cumulocity IoT Firmware update support is detailed in the following link: