thin-edge / thin-edge.io

The open edge framework for lightweight IoT devices
https://thin-edge.io
Apache License 2.0
221 stars 54 forks source link

Cumulocity c8y_Command operation is set unconditionally to SUCCESSFUL and before command has completed #1603

Closed reubenmiller closed 1 year ago

reubenmiller commented 1 year ago

Is your feature improvement request related to a problem? Please describe.

tedge version: 0.8.0

When implementing the Cumulocity c8y_Command (511) message, the following items are not processed correctly

The following details the plugin and configuration which was used:

Plugin configuration: /etc/tedge/operations/c8y/c8y_Command

[exec]
topic = "c8y/s/ds"
on_message = "511"
command = "/etc/tedge/operations/command"

Plugin: /etc/tedge/operations/command

#!/bin/bash

info() {
    echo "$(date --iso-8601=seconds 2>/dev/null || date -Iseconds) : INFO : $*" >&2
}

info "Simulating command"
sleep 10
info "Command has finished, exiting"
exit 0
  1. Configure the Cumulocity c8y_Command handler, where the plugin simulates at command which takes 10s to execute

  2. Create an c8y_Command operation using Cumulocit API, wait for the operation to complete and then check the related Cumulocity audit records

    c8y operations create --device rmiller-pi --description "Send shell command" --template "{c8y_Command:{text: 'ls -l /etc/'}}" \
    | c8y operations wait \
    | c8y auditrecords list

    The audit record will show the timestamps when the operation status transitions occurred.

Describe the solution you'd like

The plugin runner (the component calling the external script) should handle the updating of the Cumulocity operation status and output, however the operation STATUS should be set based on the exit code of the external script.

The c8y_Command should be allowed to provide the output of the command so that the user can get information from the command itself.

Result/failure message parsing rules

Before the output or failure message is added to the SMART REST template, the following santization rules/replacements should be done:

Notes

Describe alternatives you've considered

The external scripts could be given control to handle the updating of the status operations by themselves, however it assumes that the external plugins always "do the right thing". If an external program forgets to transition the operation, or it crashes unexpectedly, then the operations may be left in the EXECUTING state will will affect subsequent operations of the same fragmentType (due to Cumulocity's ID-less operation concept) Command should expect.

By having the operation status transitions in the tedge mappers, it potentially allows the same plugin (e.g. c8y_Command plugin) to be used by different mappers, as the cloud specific actions is encapsulated in the mapper itself and not in the plugin.

Additional context

The following shows the evidence of the operation being transitioned to SUCCESSFUL before the command was finished. Below shows the c8y_Command operation (after was set to SUCCESSFUL).

{
    "c8y_Command": {
        "text": "ls -l /etc/"
    },
    "creationTime": "2022-11-11T23:03:03.389Z",
    "delivery": {
        "log": [
            {
                "status": "PENDING",
                "time": "2022-11-11T23:03:03.457Z"
            },
            {
                "status": "SEND",
                "time": "2022-11-11T23:03:03.502Z"
            }
        ],
        "status": "DELIVERED",
        "time": "2022-11-11T23:03:03.559Z"
    },
    "description": "Send shell command",
    "deviceId": "14445223",
    "id": "445358",
    "self": "https://t2873877.latest.stage.c8y.io/devicecontrol/operations/445358",
    "status": "SUCCESSFUL"
}

Below shows the Cumulocity audit record entries which shows that the operation is created and set to SUCCESSFUL before the command is finished executing (~within 500 milliseconds instead of the expected ~10 seconds as set in the plugin).

$ c8y auditrecords list --source 445358
| id | time | type | AuditSourceDevice | activity | text | user |
|-------------|-------------------------------|----------------|-------------------|------------------------|---------------------------------------------------------------------------------------|--------------------|
| 445359 | 2022-11-11T23:03:03.435Z | Operation | 14445223 | Operation created | Operation created: status='PENDING', description='Send shell command', device name='… | cicd_runner01 |
| 445360 | 2022-11-11T23:03:03.529Z | Operation | 14445223 | Operation updated | Operation updated: status='PENDING', description='Send shell command', device name='… | cicd_runner01 |
| 445361 | 2022-11-11T23:03:03.583Z | Operation | 14445223 | Operation updated | Operation updated: status='PENDING', description='Send shell command', device name='… | cicd_runner01 |
| 445362 | 2022-11-11T23:03:03.655Z | Operation | 14445223 | Operation updated | Operation updated: status='EXECUTING', description='Send shell command', device name… | device_rmiller-pi |
| 445363 | 2022-11-11T23:03:03.725Z | Operation | 14445223 | Operation updated | Operation updated: status='SUCCESSFUL', description='Send shell command', device nam… | device_rmiller-pi |

Log output related to the same command (from the device)

Below shows the command log output which was created under /var/log/tedge/agent/c8y_Command-*. The timestamp entries shows that the command did take ~10 seconds to execute.

$ cat /var/log/tedge/agent/c8y_Command-2022-11-11T23\:03\:03.548394852Z.log
----- $ /usr/local/share/devmgmt/handle_command.sh "511,rmiller-pi,ls -l /etc/"
exit status: 0

stdout <<EOF
EOF

stderr <<EOF
2022-11-12T00:03:03+01:00 : INFO : Simulating command
2022-11-12T00:03:13+01:00 : INFO : Command has finished, exiting
EOF
rina23q commented 1 year ago

The operation is transitioned to EXECUTING then to SUCCESSFUL unexpectedly considering the documentation docs/src/tutorials/supported_operations.md shows that the plugin was using mosquitto_sub to send the messages anyway. I suspect the docs might be out of date now.

I would give a note for the inconsistency of our doc and the current actual behaviour. The Cumulocity Mapper is handling the operations. Last month we changed the mapper to send EXECUTING and SUCCESSFUL by https://github.com/thin-edge/thin-edge.io/pull/1475. I was not aware of that the document says sending operation status by mosquitto_pub.

reubenmiller commented 1 year ago

I would give a note for the inconsistency of our doc and the current actual behaviour. The Cumulocity Mapper is handling the operations. Last month we changed the mapper to send EXECUTING and SUCCESSFUL by https://github.com/thin-edge/thin-edge.io/pull/1475. I was not aware of that the document says sending operation status by mosquitto_pub.

Unfortunately this leads to the operation handler being useless for the c8y_Command operation as there is not chance for a result to be added to a SUCCESSFUL operation.

Generally the operation status should never be set to SUCCESSFUL if there is no verification that an action actually completed and was successful.

rina23q commented 1 year ago

But I'm surprised as the code looks handling the error case. If execute_operation fails, the failure message should be sent. So, most probably the issue is inside the function execute_operation.

https://github.com/thin-edge/thin-edge.io/blob/main/crates/core/tedge_mapper/src/c8y/converter.rs#L852

    match operations.matching_smartrest_template(template) {
        Some(operation) => {
            if let Some(command) = operation.command() {
                execute_operation(payload, command.as_str(), &operation.name, operation_logs)
                    .await?;
            }
            let topic = C8yTopic::SmartRestResponse.to_topic()?;
            let msg1 = Message::new(&topic, format!("501,{}", operation.name));
            let msg2 = Message::new(&topic, format!("503,{}", operation.name));

            Ok(vec![msg1, msg2])
        }
        None => Ok(vec![]),
    }
reubenmiller commented 1 year ago

Though the execute_operation does not actually wait for the command to complete as it is using tokio:spawn. So the control is being returned quickly.

rina23q commented 1 year ago

Merged #1641 to fix this issue. The corresponding build is https://github.com/thin-edge/thin-edge.io/actions/runs/3714360516.

I implemented as @reubenmiller wrote above except one: Escape " with \" This escape is not accepted as SmartREST, which requires CSV style escape. Thus, I escaped a single double quote " with "" (double double-quotes)

Some tips for tests:

reubenmiller commented 1 year ago

Merged #1641 to fix this issue. The corresponding build is https://github.com/thin-edge/thin-edge.io/actions/runs/3714360516.

I implemented as @reubenmiller wrote above except one: Escape " with \" This escape is not accepted as SmartREST, which requires CSV style escape. Thus, I escaped a single double quote " with "" (double double-quotes)

Some tips for tests:

  • It would be good to prepare the case, the command finishes exit 1 (non-zero) so that you can test failure case.
  • Also good to include some control characters and double quotes.

I updated the description about escaping " with "".