interuss / monitoring

InterUSS Platform USS monitoring tools for federated UTM, including automated testing.
Apache License 2.0
11 stars 19 forks source link

Invalid failure due to sent notification in Data Validation of GET operational intents by USS #797

Open BenjaminPelletier opened 5 days ago

BenjaminPelletier commented 5 days ago

Observed behavior Data Validation of GET operational intents by USS test scenario claims a USS under test sent a notification when it should not possibly have sent a notification. See attached report.

dbf9d751-f6da-4f97-a023-abcde420b8b6.zip

Test check Expect Notification not sent check

Difference from expected behavior The relevant issue is in scenario 18 (s18.html) of the sequence view. Using the queries from events 17 and 22, we can see that planning the flight in event 19 resulted in the creation of operational intent 801cf47. So, this is the operational intent associated with tested_uss's flight 1 during the "Successfully plan flight near an existing flight" test case. At the end of the test case between 29:56.941 and 29:57.614, event 39 shows closure of that flight, and event 43 shows that 801cf47 was indeed removed from the DSS.

In event 56, tested_uss is asked to plan a flight that should be (and is) rejected due to bad op intent details for a relevant op intent hosted by mock_uss -- this happens from 29:58.428 to 29:59.823.

When uss_qualifier asks mock_uss about any interactions with it in event 65, mock_uss shows item 1 as an operational intent notification, and this is presumably what the test scenario is using as evidence that the USS improperly sent a notification since it was sent at 29:58.732, which is during the time tested_uss was planning (according to event 56). However, examining the interaction, we see that it is a notification for the deletion/closure of 801cf47 from the previous test case. Basically, tested_uss is dispatching notifications asynchronously from planning, just as we would expect from an efficiently-implemented USS.

At a bare minimum, we should avoid declaring failure in this check if the notification is from an asynchronously-dispatched notification relating to the flight from the previous test case (as it is in the provided report). Ideally, we should more holistically examine the assumptions leading us to believe that tested_uss necessarily failed to meet a requirement if/when we observe a notification at this point in the test. It would also be great to augment the test check failure details to be more specific about why the failure was declared since "Notification was wrongly sent for an entity not created" is not nearly as helpful as "Observed notification for operational intent 801cf474-9fa1-40ca-af40-01489357465c received at 2024-10-01T23:29:58.732903Z when no notification should have been sent during rejected planning attempt 8e03709-0151-49ab-a41d-0c39d11ae50d occuring from 2024-10-01T23:29:58.428567Z to 2024-10-01T23:29:59.822955Z." We don't necessarily need this level of detail for all test check failure messages, but the key ones that test scenarios/test cases are centered around should probably have a similar level of detail -- we want to enable failing USSs to quickly and easily understand the observations that led to declaration of compliance failure.

CC: @punamverma

punamverma commented 4 days ago

Hi Ben,

I agree the tested_uss should not fail in the second test case.

Seems like one solution to this bug could be to add a check for tested_uss sent a notification after ending its operation. This way the notification wouldn’t go into the subsequent test.

I think we should keep the check Expect Notification not sent check that helps to verify that no notification is sent for an operation that has not been created or updated in DSS. This check would be helpful to catch the wrong implementations by tested_USS, where they could be doing these two steps - creating an operational intent in DSS and notifying USSes - in parallel.

Thanks.