thin-edge / thin-edge.io

The open edge framework for lightweight IoT devices
https://thin-edge.io
Apache License 2.0
223 stars 55 forks source link

dns error causes software install operation to be stuck in EXECUTING state #2313

Closed reubenmiller closed 3 weeks ago

reubenmiller commented 1 year ago

Describe the bug

The Cumulocity IoT install software operation Operation was stuck in EXECUTING status due to a dns lookup error.

Oct 05 12:21:29 pippin tedge-mapper[462]: 2023-10-05T02:21:29.583566419Z ERROR c8y_mapper_ext::converter: Mapping error: error trying to connect: dns error: failed to lookup address information: Try again

It seems that this unexpected error was not properly captured and the operation was not transitioned to the FAILED status.

To Reproduce

The bug has not been able to be easily reproduced but the steps

  1. Install thin-edge.io
  2. Connect to Cumulocity IoT
  3. Install software via the Cumulocity IoT Software Management tab in the Device Management Application

Expected behavior

The operation should be set to FAILED if any errors occur. In this case there was a dns lookup issue which was resolved by restarting the tedge-mapper-c8y.

Screenshots

Environment (please complete the following information):

Property Value
OS [incl. version] Debian GNU/Linux 11 (bullseye)
Hardware [incl. revision] Raspberry Pi Zero 2 W Rev 1.0
System-Architecture Linux pippin 6.1.21-v8+ #1642 SMP PREEMPT Mon Apr 3 17:24:16 BST 2023 aarch64 GNU/Linux
thin-edge.io version tedge 0.12.1~365+g5a52630

Additional context

Logs

Oct 05 12:20:29 pippin tedge-agent[459]: 2023-10-05T02:20:29.05003086Z  INFO plugin_sm::plugin_manager: Plugin activated: /etc/tedge/sm-plugins/apt
Oct 05 12:20:29 pippin sudo[1452]:    tedge : PWD=/tmp ; USER=root ; COMMAND=/etc/tedge/sm-plugins/apt prepare
Oct 05 12:20:29 pippin sudo[1452]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=999)
Oct 05 12:20:52 pippin sudo[1452]: pam_unix(sudo:session): session closed for user root
Oct 05 12:20:52 pippin sudo[1870]:    tedge : PWD=/tmp ; USER=root ; COMMAND=/etc/tedge/sm-plugins/apt update-list
Oct 05 12:20:52 pippin sudo[1870]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=999)
Oct 05 12:21:10 pippin systemd[1]: Starting Cleanup of Temporary Directories...
Oct 05 12:21:10 pippin systemd[1]: systemd-tmpfiles-clean.service: Succeeded.
Oct 05 12:21:10 pippin systemd[1]: Finished Cleanup of Temporary Directories.
Oct 05 12:21:19 pippin sudo[1870]: pam_unix(sudo:session): session closed for user root
Oct 05 12:21:19 pippin sudo[1915]:    tedge : PWD=/tmp ; USER=root ; COMMAND=/etc/tedge/sm-plugins/apt finalize
Oct 05 12:21:19 pippin sudo[1915]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=999)
Oct 05 12:21:24 pippin sudo[1915]: pam_unix(sudo:session): session closed for user root
Oct 05 12:21:24 pippin sudo[1920]:    tedge : PWD=/tmp ; USER=root ; COMMAND=/etc/tedge/sm-plugins/apt list
Oct 05 12:21:24 pippin sudo[1920]: pam_unix(sudo:session): session opened for user root(uid=0) by (uid=999)
Oct 05 12:21:24 pippin sudo[1920]: pam_unix(sudo:session): session closed for user root
Oct 05 12:21:29 pippin tedge-mapper[462]: 2023-10-05T02:21:29.583566419Z ERROR c8y_mapper_ext::converter: Mapping error: error trying to connect: dns error: failed to lookup address information: Try again
reubenmiller commented 1 year ago

The device in question had an invalid nameserver entry in the /etc/resolv.conf due to misconfiguration of the router which the device was using for connectivity. 2/3 of the nameserver addresses were valid, so this meant that occasionally the dns resolution would fail.

Fixing the erroneous nameserver entry removed the error, however there is the question whether the dns resolver being used by thin-edge.io should be changed in order to be more reliable in the face of misconfiguration as other products such as curl were found to handle this situation reliably.

reubenmiller commented 3 weeks ago

Closing as the device should be configured with correct DNS settings.