thin-edge / thin-edge.io

The open edge framework for lightweight IoT devices
https://thin-edge.io
Apache License 2.0
222 stars 55 forks source link

supported operations for a child device don't always get sent to the cloud #2645

Closed reubenmiller closed 9 months ago

reubenmiller commented 9 months ago

Describe the bug

Operations registered by a child device (e.g. tedge-agent running as a child device), are sometimes not translated to the Cumulocity IoT cloud registration messages (e.g. 114).

Currently the problem has only been observed when the tedge-agent is running as a child device, however it should be verified that it is also not an issue with the main device as well.

Problem was discovered when investigating https://github.com/thin-edge/thin-edge.io/issues/2346

Symptoms

To Reproduce

The problem is observable by running the following command from the repository:

invoke flake-finder --test-name "Successful log operation" --iterations 10 --clean --suite "Log Operation Child"

Expected behavior

Screenshots

Environment (please complete the following information):

Property Value
OS [incl. version] Debian GNU/Linux 12 (bookworm)
Hardware [incl. revision] unknown
System-Architecture Linux 55e141da3961 6.5.0-15-generic #15-Ubuntu SMP PREEMPT_DYNAMIC Tue Jan 9 22:39:36 UTC 2024 aarch64 GNU/Linux
thin-edge.io version tedge 0.13.2~449+g9dbd1ad

Additional context

MQTT Messages

tedge mqtt sub '#'
[te/device/main/service/mosquitto-c8y-bridge] {"@id":"TST_raise_religious_gig:device:main:service:mosquitto-c8y-bridge","@parent":"device/main//","@type":"service","name":"mosquitto-c8y-bridge","type":"service"}
[te/device/main/service/mosquitto-c8y-bridge/status/health] 1
[te/device/main/service/c8y-firmware-plugin] {"@parent":"device/main//","@type":"service","type":"service"}
[te/device/main/service/c8y-firmware-plugin/status/health] {"pid":479,"status":"up","time":1706774610.8726873}
[te/device/main/service/tedge-mapper-c8y] {"@parent":"device/main//","@type":"service","type":"service"}
[te/device/main/service/tedge-mapper-c8y/status/health] {"pid":541,"status":"up","time":1706774613.0842247}
[te/device/main/service/tedge-agent] {"@parent":"device/main//","@type":"service","type":"service"}
[te/device/main/service/tedge-agent/status/health] {"pid":573,"status":"up","time":1706774613.412329}
[te/device/main///twin/c8y_Agent] {"name":"thin-edge.io","url":"https://thin-edge.io","version":"0.13.2~449+g9dbd1ad"}
[te/device/main///cmd/log_upload] {"types":[]}
[te/device/main///cmd/restart] {}
[te/device/main///cmd/software_list] {}
[te/device/main///cmd/software_update] {}
[te/device/main///cmd/config_snapshot] {"types":["/etc/tedge/tedge.toml","system.toml","tedge-configuration-plugin"]}
[te/device/main///cmd/config_update] {"types":["/etc/tedge/tedge.toml","system.toml","tedge-configuration-plugin"]}
[te/device/TST_cut_blocky_foie_gras/service/tedge-agent] {"@id":"TST_raise_religious_gig:device:TST_cut_blocky_foie_gras:service:tedge-agent","@parent":"device/TST_cut_blocky_foie_gras//","@type":"service","name":"tedge-agent","type":"service"}
[te/device/TST_cut_blocky_foie_gras/service/tedge-agent/status/health] {"pid":210,"status":"up","time":1706774622.531569}
[te/device/TST_cut_blocky_foie_gras//] {"@id":"TST_raise_religious_gig:device:TST_cut_blocky_foie_gras","@type":"child-device","name":"TST_cut_blocky_foie_gras"}
[te/device/TST_cut_blocky_foie_gras///cmd/restart] {}
[te/device/TST_cut_blocky_foie_gras///cmd/software_list] {}
[te/device/TST_cut_blocky_foie_gras///cmd/software_update] {}
[te/device/TST_cut_blocky_foie_gras///cmd/log_upload] {"types":["example","multiple_logfiles"]}
[te/device/TST_cut_blocky_foie_gras///cmd/config_snapshot] {"types":["/etc/tedge/tedge.toml","system.toml","tedge-configuration-plugin"]}
[te/device/TST_cut_blocky_foie_gras///cmd/config_update] {"types":["/etc/tedge/tedge.toml","system.toml","tedge-configuration-plugin"]}

Child Device - Managed Object

{
  "additionParents": {
    "references": [],
    "self": "https://t493319102.eu-latest.cumulocity.com/inventory/managedObjects/8133460072/additionParents"
  },
  "assetParents": {
    "references": [],
    "self": "https://t493319102.eu-latest.cumulocity.com/inventory/managedObjects/8133460072/assetParents"
  },
  "c8y_SoftwareList": [],
  "c8y_SupportedConfigurations": [
    "/etc/tedge/tedge.toml",
    "system.toml",
    "tedge-configuration-plugin"
  ],
  "c8y_SupportedLogs": ["example", "multiple_logfiles"],
  "c8y_SupportedOperations": ["c8y_Restart", "c8y_SoftwareUpdate"],
  "childAdditions": {
    "references": [
      {
        "managedObject": {
          "id": "3033460073",
          "name": "tedge-agent",
          "self": "https://t493319102.eu-latest.cumulocity.com/inventory/managedObjects/3033460073"
        },
        "self": "https://t493319102.eu-latest.cumulocity.com/inventory/managedObjects/8133460072/childAdditions/3033460073"
      }
    ],
    "self": "https://t493319102.eu-latest.cumulocity.com/inventory/managedObjects/8133460072/childAdditions"
  },
  "childAssets": {
    "references": [],
    "self": "https://t493319102.eu-latest.cumulocity.com/inventory/managedObjects/8133460072/childAssets"
  },
  "childDevices": {
    "references": [],
    "self": "https://t493319102.eu-latest.cumulocity.com/inventory/managedObjects/8133460072/childDevices"
  },
  "creationTime": "2024-02-01T08:03:42.543Z",
  "deviceParents": {
    "references": [],
    "self": "https://t493319102.eu-latest.cumulocity.com/inventory/managedObjects/8133460072/deviceParents"
  },
  "id": "8133460072",
  "lastUpdated": "2024-02-01T08:03:42.686Z",
  "name": "TST_cut_blocky_foie_gras",
  "owner": "device_TST_raise_religious_gig",
  "self": "https://t493319102.eu-latest.cumulocity.com/inventory/managedObjects/8133460072",
  "type": "thin-edge.io-child"
}

Child device - tedge-agent logs

journalctl -u tedge-agent -n 100
Feb 01 08:03:42 2252cb9bc74c systemd[1]: Starting tedge-agent.service - tedge-agent is a thin-edge.io component to support operations....
Feb 01 08:03:42 2252cb9bc74c systemd[1]: Started tedge-agent.service - tedge-agent is a thin-edge.io component to support operations..
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.530908859Z  INFO tedge_agent::agent: tedge-agent starting
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.53093665Z  INFO sm-agent: tedge_agent::agent: Starting tedge-agent v0.13.2~449+g9dbd1ad
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.5311421Z  INFO Runtime: Started
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531226846Z  INFO sm-agent: tedge_config_manager::config: Using the configuration from /etc/tedge/plugins/tedge-configuration-plugin.toml
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531316509Z  INFO sm-agent: log_manager::config: Using the configuration from /etc/tedge/plugins/tedge-log-plugin.toml
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531368299Z  INFO sm-agent: tedge_agent::agent: Running as a child device, tedge_to_te_converter and File Transfer Service disabled
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531415214Z ERROR sm-agent: tedge_agent::state_repository::state: Cannot use "/data/tedge/agent" to store tedge-agent state: No such file or directory (os error 2)
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.53142338Z  INFO sm-agent: tedge_agent::state_repository::state: Use "/etc/tedge/.agent/restart-current-operation" to store tedge-agent restart-current-operation state
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531429421Z ERROR sm-agent: tedge_agent::state_repository::state: Cannot use "/data/tedge/agent" to store tedge-agent state: No such file or directory (os error 2)
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531432921Z  INFO sm-agent: tedge_agent::state_repository::state: Use "/etc/tedge/.agent/software-current-operation" to store tedge-agent software-current-operation state
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531439254Z ERROR sm-agent: tedge_agent::state_repository::state: Cannot use "/data/tedge/agent" to store tedge-agent state: No such file or directory (os error 2)
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531443629Z  INFO sm-agent: tedge_agent::state_repository::state: Use "/etc/tedge/.agent/workflows" to store tedge-agent workflows state
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531447421Z  INFO Runtime: Running Signal-Handler-0
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531478461Z  INFO Runtime: Running MQTT-1
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531486336Z  INFO Runtime: Running FsWatcher-2
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531490169Z  INFO Runtime: Running Downloader-3
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.53150171Z  INFO Runtime: Running Uploader-4
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531507835Z  INFO Runtime: Running ConfigManager-5
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531514126Z  INFO Runtime: Running LogManager-6
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531543625Z  INFO Runtime: Running RestartManagerActor-7
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531547833Z  INFO Runtime: Running SoftwareManagerActor-8
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531550833Z  INFO Runtime: Running Script-9
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531554083Z  INFO Runtime: Running TedgeOperationConverter-10
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531557708Z  INFO Runtime: Running HealthMonitorActor-11
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531661912Z  INFO mqtt_channel::connection: MQTT connecting to broker: host=192.168.16.2:1883, session_name=Some("tedge-agent#te/device/TST_cut_blocky_foie_gras//")
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.531840196Z  INFO tedge_log_manager::actor: Reloading supported log types
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.532103852Z  INFO log_manager::config: Using the configuration from /etc/tedge/plugins/tedge-log-plugin.toml
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.532460629Z  INFO tedge_config_manager::config: Using the configuration from /etc/tedge/plugins/tedge-configuration-plugin.toml
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.53247042Z  INFO mqtt_channel::connection: MQTT connection established
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.532791449Z ERROR tedge_agent::tedge_operation_converter::actor: Fail to reload pending command requests from /etc/tedge/.agent/workflows due to: Could not read state from file `/etc/tedge/.agent/workflows`: No such file or directory (os error 2)
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.534264555Z  INFO plugin_sm::plugin_manager: Default plugin type: Not configured
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.534279638Z  WARN tedge_agent::software_manager::actor: Couldn't load plugins from /etc/tedge/sm-plugins
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.578963843Z  INFO tedge_agent::tedge_operation_converter::actor: Moving software_list operation to state: scheduled
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.580677439Z  INFO tedge_agent::tedge_operation_converter::actor: Processing software_list operation scheduled step
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.583858267Z  INFO tedge_agent::software_manager::actor: Checking if tedge got self updated
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.583867975Z  INFO tedge_agent::software_manager::actor: Current running version: 0.13.2~449+g9dbd1ad
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.584489199Z  INFO tedge_agent::software_manager::actor: Installed binary version: 0.13.2~449+g9dbd1ad
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.587239628Z  INFO tedge_agent::tedge_operation_converter::actor: Processing software_list operation executing step
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.587275085Z  INFO tedge_agent::software_manager::actor: Checking if tedge got self updated
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.587278585Z  INFO tedge_agent::software_manager::actor: Current running version: 0.13.2~449+g9dbd1ad
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.587818521Z  INFO tedge_agent::tedge_operation_converter::actor: Waiting successful software_list operation to be cleared
Feb 01 08:03:42 2252cb9bc74c tedge-agent[210]: 2024-02-01T08:03:42.587863186Z  INFO tedge_agent::software_manager::actor: Installed binary version: 0.13.2~449+g9dbd1ad

Main Device - entity_store.jsonl

cat /etc/tedge/.tedge-mapper-c8y/entity_store.jsonl
 {"version":"1.0"}
{"topic":{"name":"te/device/main///twin/c8y_Agent"},"payload":"{\"name\":\"thin-edge.io\",\"url\":\"https://thin-edge.io\",\"version\":\"0.13.2~449+g9dbd1ad\"}","qos":1,"retain":true}
{"topic":{"name":"te/device/main/service/c8y-firmware-plugin"},"payload":"{\"@parent\":\"device/main//\",\"@type\":\"service\",\"type\":\"service\"}","qos":1,"retain":true}
{"topic":{"name":"te/device/main/service/mosquitto-c8y-bridge"},"payload":"{\"@id\":\"TST_raise_religious_gig:device:main:service:mosquitto-c8y-bridge\",\"@parent\":\"device/main//\",\"@type\":\"service\",\"name\":\"mosquitto-c8y-bridge\",\"type\":\"service\"}","qos":1,"retain":true}
{"topic":{"name":"te/device/main/service/tedge-mapper-c8y"},"payload":"{\"@parent\":\"device/main//\",\"@type\":\"service\",\"type\":\"service\"}","qos":1,"retain":true}
{"topic":{"name":"te/device/main/service/tedge-agent"},"payload":"{\"@parent\":\"device/main//\",\"@type\":\"service\",\"type\":\"service\"}","qos":1,"retain":true}
{"topic":{"name":"te/device/TST_cut_blocky_foie_gras//"},"payload":"{\"@id\":\"TST_raise_religious_gig:device:TST_cut_blocky_foie_gras\",\"@type\":\"child-device\",\"name\":\"TST_cut_blocky_foie_gras\"}","qos":1,"retain":true}
{"topic":{"name":"te/device/TST_cut_blocky_foie_gras/service/tedge-agent"},"payload":"{\"@parent\":\"device/TST_cut_blocky_foie_gras//\",\"@type\":\"service\",\"type\":\"service\"}","qos":1,"retain":true}
{"topic":{"name":"te/device/TST_cut_blocky_foie_gras/service/tedge-agent"},"payload":"{\"@id\":\"TST_raise_religious_gig:device:TST_cut_blocky_foie_gras:service:tedge-agent\",\"@parent\":\"device/TST_cut_blocky_foie_gras//\",\"@type\":\"service\",\"name\":\"tedge-agent\",\"type\":\"service\"}","qos":1,"retain":true}
reubenmiller commented 9 months ago

Further debugging shows that the tedge-agent running on the child device registered all of the command support correctly, however the tedge-mapper-c8y didn't send the appropriate messages to Cumulocity IoT.

Below shows the MQTT messages related to the operations registration for the child device:

tedge mqtt sub 'te/device/TST_cut_blocky_foie_gras///cmd/#'
INFO: Connected
[te/device/TST_cut_blocky_foie_gras///cmd/restart] {}
[te/device/TST_cut_blocky_foie_gras///cmd/software_list] {}
[te/device/TST_cut_blocky_foie_gras///cmd/software_update] {}
[te/device/TST_cut_blocky_foie_gras///cmd/log_upload] {"types":["example","multiple_logfiles"]}
[te/device/TST_cut_blocky_foie_gras///cmd/config_snapshot] {"types":["/etc/tedge/tedge.toml","system.toml","tedge-configuration-plugin"]}
[te/device/TST_cut_blocky_foie_gras///cmd/config_update] {"types":["/etc/tedge/tedge.toml","system.toml","tedge-configuration-plugin"]}

On the main device, clearing the entity_store.jsonl file, and restarting the tedge-mapper-c8y seems to trigger sending the correct operation registration messages to Cumulocity IoT.

sudo systemctl stop tedge-mapper-c8y
sudo rm /etc/tedge/.tedge-mapper-c8y/entity_store.jsonl
sudo systemctl start tedge-mapper-c8y

Afterwards, the child device managed object in Cumulocity IoT has the expected supported operations (below is an extern from th):

{
  "additionParents": {
    "references": [],
    "self": "https://t493319102.eu-latest.cumulocity.com/inventory/managedObjects/8133460072/additionParents"
  },
  "assetParents": {
    "references": [],
    "self": "https://t493319102.eu-latest.cumulocity.com/inventory/managedObjects/8133460072/assetParents"
  },
  "c8y_SoftwareList": [],
  "c8y_SupportedConfigurations": [
    "/etc/tedge/tedge.toml",
    "system.toml",
    "tedge-configuration-plugin"
  ],
  "c8y_SupportedLogs": ["example", "multiple_logfiles"],
  "c8y_SupportedOperations": [
    "c8y_DownloadConfigFile",
    "c8y_LogfileRequest",
    "c8y_Restart",
    "c8y_SoftwareUpdate",
    "c8y_UploadConfigFile"
  ],
  "childAdditions": {
    "references": [
      {
        "managedObject": {
          "id": "3033460073",
          "name": "tedge-agent",
          "self": "https://t493319102.eu-latest.cumulocity.com/inventory/managedObjects/3033460073"
        },
        "self": "https://t493319102.eu-latest.cumulocity.com/inventory/managedObjects/8133460072/childAdditions/3033460073"
      }
    ],
    "self": "https://t493319102.eu-latest.cumulocity.com/inventory/managedObjects/8133460072/childAdditions"
  },
  "childAssets": {
    "references": [],
    "self": "https://t493319102.eu-latest.cumulocity.com/inventory/managedObjects/8133460072/childAssets"
  },
  "childDevices": {
    "references": [],
    "self": "https://t493319102.eu-latest.cumulocity.com/inventory/managedObjects/8133460072/childDevices"
  },
  "creationTime": "2024-02-01T08:03:42.543Z",
  "deviceParents": {
    "references": [],
    "self": "https://t493319102.eu-latest.cumulocity.com/inventory/managedObjects/8133460072/deviceParents"
  },
  "id": "8133460072",
  "lastUpdated": "2024-02-01T08:33:20.599Z",
  "name": "TST_cut_blocky_foie_gras",
  "owner": "device_TST_raise_religious_gig",
  "self": "https://t493319102.eu-latest.cumulocity.com/inventory/managedObjects/8133460072",
  "type": "thin-edge.io-child"
}
reubenmiller commented 9 months ago

Running the flake-finder tasks reveals that the tests is failing about 50% of the time.

invoke flake-finder --test-name "Successful log operation" --iterations 10 --clean --suite "Log Operation Child"
Overall: FAILED
Results: 10 iterations, 5 passed, 5 failed
Elapsed time: 0:05:13.918494
Failed iterations: [3, 4, 6, 7, 8]
albinsuresh commented 9 months ago

Test

The bug has been fixed by making sure that the supported operations message (SmartREST 114) is sent whenever a command capability message is received, as long as it adds a new supported operation. If a command capability message does not change the aggregated list (e.g: an already registered capability is resent) of supported operations, then this 114 message is not sent unnecessarily.

gligorisaev commented 9 months ago

QA has thoroughly checked the bug and here are the results: