thin-edge / thin-edge.io

The open edge framework for lightweight IoT devices
https://thin-edge.io
Apache License 2.0
221 stars 54 forks source link

flaky system test: Tests.Cumulocity.Log.Log Operation Child #2346

Closed rina23q closed 8 months ago

rina23q commented 1 year ago

Describe the bug Robot Framework test "Successful log operation" case fails randomly on GH Actions. tests/RobotFramework/tests/cumulocity/log/log_operation_child.robot

==============================================================================
Tests.Cumulocity.Log.Log Operation Child                                      
==============================================================================
Successful log operation                                              | FAIL |
Device is missing some fragments. wanted=['c8y_SupportedOperations'], got=['type', 'name', 'owner', 'c8y_SupportedLogs']
------------------------------------------------------------------------------

For the full log, see the attempt here and here.

To Reproduce Run the integration test on GitHub Actions

Expected behavior Always no failure

Screenshots image

Environment (please complete the following information): GitHub Actions

Additional context

reubenmiller commented 11 months ago

Summary

This looks to be a legimate error where sometimes the 114, smart rest message is not being sent to Cumulocity IoT via the c8y/s/us topic.

~Adding the following snippet to the test after the setup of the main device seems to avoid the error state, however it indicates that there might be a general problem if a child device registers itself and capabilities before the main device has been registered with the cloud.~ This statement proved to be false as errors were seen with this assertion in place.

Cumulocity.Device Should Exist    ${PARENT_SN}

FAIL - MQTT Messages

Sequence of MQTT message when the test fails. The logs shows that the smart rest message with the payload 114,c8y_LogfileRequest is missing after the child device registers support for the log_upload commands and related types.

2023-11-04T16:41:28.772364Z      te/device/child01/service/tedge-log-plugin                 {"@parent":"device/child01//","@type":"service","type":"service"}
2023-11-04T16:41:28.772591Z      te/device/child01/service/tedge-log-plugin/status/health   {"pid":218,"status":"up","time":"2023-11-04T16:41:28.750855773Z"}
2023-11-04T16:41:28.772767Z      te/device/child01///cmd/log_upload                         {"types":[]}
2023-11-04T16:41:28.833424Z      te/device/child01//                                        {"@id":"main:device:child01","@type":"child-device","name":"child01"}

2023-11-04T16:41:28.833596Z      c8y/s/us                                                   101,main:device:child01,child01,thin-edge.io-child
2023-11-04T16:41:28.833702Z      te/device/child01/service/tedge-log-plugin                 {"@id":"main:device:child01:service:tedge-log-plugin","@parent":"device/child01//","@type":"service","name":"tedge-log-plugin","type":"service"}
2023-11-04T16:41:28.833810Z      c8y/s/us/main:device:child01                               102,main:device:child01:service:tedge-log-plugin,service,tedge-log-plugin,up
2023-11-04T16:41:28.833924Z      c8y/s/us/main:device:child01/main:device:child01:service:tedge-log-plugin 104,up
2023-11-04T16:41:28.834037Z      c8y/s/us/main:device:child01                               118,

2023-11-04T16:41:28.834102Z      te/device/child01//                                        {"@id":"main:device:child01","@type":"child-device","name":"child01"}
2023-11-04T16:41:28.834216Z      c8y/s/us                                                   101,main:device:child01,child01,thin-edge.io-child
2023-11-04T16:41:28.921110Z      te/device/child01///cmd/log_upload                         {"types":["example"]}
2023-11-04T16:41:28.921318Z      te/device/child01///cmd/log_upload                         {"types":["example"]}
2023-11-04T16:41:28.921453Z      te/device/child01///cmd/log_upload                         {"types":["example"]}
2023-11-04T16:41:28.921566Z      c8y/s/us/main:device:child01                               118,example
2023-11-04T16:41:28.921668Z      te/device/child01///cmd/log_upload                         {"types":["example"]}
2023-11-04T16:41:28.921775Z      te/device/child01///cmd/log_upload                         {"types":["example"]}
2023-11-04T16:41:28.921922Z      c8y/s/us/main:device:child01                               118,example
2023-11-04T16:41:28.922168Z      c8y/s/us/main:device:child01                               118,example
2023-11-04T16:41:28.922204Z      c8y/s/us/main:device:child01                               118,example
2023-11-04T16:41:28.922327Z      c8y/s/us/main:device:child01                               118,example
2023-11-04T16:41:29.022386Z      c8y/s/e                                                                41,101,Device with serial 'main:device:child01' already existing
2023-11-04T16:41:29.126158Z      te/device/child01///cmd/log_upload                         {"types":["example"]}
2023-11-04T16:41:29.195005Z      c8y/s/us/main:device:child01                               118,example

PASS - MQTT Messages

Sequence of MQTT message when the test passes.

2023-11-04T16:36:29.869447Z      te/device/child01///cmd/log_upload                         {"types":[]}
2023-11-04T16:36:29.869602Z      te/device/child01/service/tedge-log-plugin                 {"@parent":"device/child01//","@type":"service","type":"service"}
2023-11-04T16:36:29.869687Z      te/device/child01/service/tedge-log-plugin/status/health   {"pid":214,"status":"up","time":"2023-11-04T16:36:29.840132131Z"}
2023-11-04T16:36:29.884289Z      te/device/child01//                                        {"@id":"main:device:child01","@type":"child-device","name":"child01"}
2023-11-04T16:36:29.902904Z      c8y/s/us                                                   101,main:device:child01,child01,thin-edge.io-child

2023-11-04T16:36:29.921552Z      c8y/s/us/main:device:child01                               118,
2023-11-04T16:36:29.921567Z      te/device/child01//                                        {"@id":"main:device:child01","@type":"child-device","name":"child01"}
2023-11-04T16:36:29.921586Z      c8y/s/us                                                   101,main:device:child01,child01,thin-edge.io-child
2023-11-04T16:36:29.921602Z      c8y/s/us/main:device:child01                               102,main:device:child01:service:tedge-log-plugin,service,tedge-log-plugin,up
2023-11-04T16:36:29.921625Z      c8y/s/us/main:device:child01/main:device:child01:service:tedge-log-plugin 104,up
2023-11-04T16:36:29.921637Z      c8y/s/us/main:device:child01                               114,c8y_LogfileRequest
2023-11-04T16:36:30.006980Z      c8y/s/e                                                    41,101,Device with serial 'main:device:child01' already existing

2023-11-04T16:36:30.151541Z      te/device/child01///cmd/log_upload                         {"types":["example"]}
2023-11-04T16:36:30.151687Z      te/device/child01///cmd/log_upload                         {"types":["example"]}
2023-11-04T16:36:30.151802Z      te/device/child01///cmd/log_upload                         {"types":["example"]}
2023-11-04T16:36:30.151905Z      c8y/s/us/main:device:child01                               118,example
2023-11-04T16:36:30.152002Z      te/device/child01///cmd/log_upload                         {"types":["example"]}
2023-11-04T16:36:30.152280Z      te/device/child01///cmd/log_upload                         {"types":["example"]}
2023-11-04T16:36:30.164884Z      c8y/s/us/main:device:child01                               118,example
2023-11-04T16:36:30.164965Z      te/device/child01///cmd/log_upload                         {"types":["example"]}
2023-11-04T16:36:30.165059Z      c8y/s/us/main:device:child01                               118,example
2023-11-04T16:36:30.165182Z      c8y/s/us/main:device:child01                               118,example
2023-11-04T16:36:30.165558Z      c8y/s/us/main:device:child01                               118,example
2023-11-04T16:36:30.165749Z      c8y/s/us/main:device:child01                               118,example
2023-11-04T16:36:30.425559Z      te/device/child01///cmd/log_upload                         {"types":["example"]}
2023-11-04T16:36:30.487238Z      c8y/s/us/main:device:child01                               118,example

2023-11-04T16:36:30.723950Z      c8y/s/ds                                                   522,main:device:child01,example,2023-11-03T16:36:30+0000,2023-11-04T16:37:30+0000,first,10
2023-11-04T16:36:30.818156Z      te/device/child01///cmd/log_upload/c8y-mapper-2023-11-04T16:36:30.727078423Z {"status":"init","tedgeUrl":"http://172.21.0.4:8000/tedge/file-transfer/main:device:child01/log_upload/example-c8y-mapper-2023-11-04T16:36:30.727078423Z","type":"example","dateFrom":"2023-11-03T16:36:30Z","dateTo":"2023-11-04T16:37:30Z","searchText":"first","lines":10}
2023-11-04T16:36:30.909077Z      te/device/child01///cmd/log_upload/c8y-mapper-2023-11-04T16:36:30.727078423Z {"status":"executing","tedgeUrl":"http://172.21.0.4:8000/tedge/file-transfer/main:device:child01/log_upload/example-c8y-mapper-2023-11-04T16:36:30.727078423Z","type":"example","dateFrom":"2023-11-03T16:36:30Z","dateTo":"2023-11-04T16:37:30Z","searchText":"first","lines":10}
2023-11-04T16:36:30.912591Z      te/device/child01///cmd/log_upload/c8y-mapper-2023-11-04T16:36:30.727078423Z {"status":"successful","tedgeUrl":"http://172.21.0.4:8000/tedge/file-transfer/main:device:child01/log_upload/example-c8y-mapper-2023-11-04T16:36:30.727078423Z","type":"example","dateFrom":"2023-11-03T16:36:30Z","dateTo":"2023-11-04T16:37:30Z","searchText":"first","lines":10}
2023-11-04T16:36:30.914168Z      c8y/s/us/main:device:child01                               501,c8y_LogfileRequest
2023-11-04T16:36:31.088434Z      c8y/s/us/main:device:child01                               503,c8y_LogfileRequest,https://test-ci-runner01.latest.stage.c8y.io/event/events/3988020/binaries
2023-11-04T16:36:31.088537Z      te/device/child01///cmd/log_upload/c8y-mapper-2023-11-04T16:36:30.727078423Z

To Reproduce

Pre-requisites:

This can be reproduced semi-reliably by reducing the allocated CPU quota/period used by the docker image

Edit your .env file and add the following (note exact values will depend on your machine):

DOCKER_OPTIONS_CPU_QUOTA=5000
DOCKER_OPTIONS_CPU_PERIOD=20000

Then run the flake-finder system test task on the given system test in question:

invoke flake-finder --test-name "Successful log operation" --iterations 20 --clean --suite "Log Operation Child"

This should invoke at least 1 error (if not then just re-run until the error is invoked), an example showing the error output is shown below:

Overall: FAILED
Results: 20 iterations, 18 passed, 2 failed
Elapsed time: 0:56:58.761916
Failed iterations: [9, 13]
reubenmiller commented 11 months ago

Manually registered the child device (see 638920e) seems to have improved the reliability of the previous failure step (due to the log file type not being registered).

Re-running the flake-finder yielded only 2 failures, however the 2 failures were due to a different failure reason as detailed in https://github.com/thin-edge/thin-edge.io/issues/2414

Overall: FAILED
Results: 20 iterations, 18 passed, 2 failed
Elapsed time: 0:38:06.543309
albinsuresh commented 8 months ago

Here are some recent failures, but with a different assertion failure than what's reported in the description:

c8y_SupportedOperations is missing expected operations. missing=['c8y_LogfileRequest'] got=['c8y_Restart', 'c8y_SoftwareUpdate']
reubenmiller commented 8 months ago

Closing as this should be resolved by #2645