sonic-net / sonic-mgmt

Configuration management examples for SONiC
Other
201 stars 727 forks source link

[Bug]: `conditional_mark` only considers the longest match in the yaml file even when skipping. #14698

Closed arista-nwolfe closed 3 weeks ago

arista-nwolfe commented 1 month ago

Issue Description

As part of https://github.com/sonic-net/sonic-mgmt/pull/11968 there was a fix in the function find_longest_matches to now correctly find only the longest matching entry in tests_mark_conditions.yaml.

However, this bugfix now exposes the issue that multiple length'd skip conditions only result in the longest skip occurring and causing tests to be run on undesired SKUs.

Results you see

E.G. looking specifically at generic_config_updater/test_ecn_config_update.py::test_ecn_config_updates

This test has 2 matching entries in tests_mark_conditions.yaml:

generic_config_updater:
  skip:
    reason: 'generic_config_updater is not a supported feature for T2'
    conditions:
      - "'t2' in topo_name"
generic_config_updater/test_ecn_config_update.py::test_ecn_config_updates:
  skip:
    reason: "This test is not run on this asic type, topology, or version currently"
    conditions_logical_operator: "OR"
    conditions:
      - "asic_type in ['cisco-8000']"
      - "topo_type in ['m0', 'mx']"
      - "release in ['202211']"

But now because of the fix we'll only use the latter rule because it's longer. Meaning we no longer skip this test on T2 devices even though none of the tests in this generic_config_updater folder should run on T2.

In order to support this with the new "longest match wins" approach we'd need to replicate the 't2' in topo_name on every generic_config_updater rule.

Results you expected to see

generic_config_updater/test_ecn_config_update.py::test_ecn_config_updates should be skipped on T2 topologies.

Is it platform specific

generic

Relevant log output

No response

Output of show version

No response

Attach files (if any)

No response

kenneth-arista commented 1 month ago

@arlakshm

yutongzhang-microsoft commented 1 month ago

Hi, @arista-nwolfe , this behavior is intentional and follows the longest matching rule. Our approach is as follows:

  1. Collect all marks along the matching path.
  2. If a mark is unique among the matches, it is applied.
  3. If not, the conditions from the longest matching path take precedence.
arista-nwolfe commented 1 month ago

Hi, @arista-nwolfe , this behavior is intentional and follows the longest matching rule. Our approach is as follows:

  1. Collect all marks along the matching path.
  2. If a mark is unique among the matches, it is applied.
  3. If not, the conditions from the longest matching path take precedence.

I agree that this was a bug that was fixed. The problem is all the rules in the tests_mark_conditions.yaml were made with the assumption that this bug exists and the behavior is we consider all rules for a given test (not just the longest one). So fixing this bug without fixing all the rules to no longer make this assumption will result in many tests incorrectly running on platforms/topologies which they shouldn't be running on.

yutongzhang-microsoft commented 1 month ago

Hi, @arista-nwolfe , this behavior is intentional and follows the longest matching rule. Our approach is as follows:

  1. Collect all marks along the matching path.
  2. If a mark is unique among the matches, it is applied.
  3. If not, the conditions from the longest matching path take precedence.

I agree that this was a bug that was fixed. The problem is all the rules in the tests_mark_conditions.yaml were made with the assumption that this bug exists and the behavior is we consider all rules for a given test (not just the longest one). So fixing this bug without fixing all the rules to no longer make this assumption will result in many tests incorrectly running on platforms/topologies which they shouldn't be running on.

No, this is not a bug.

kenneth-arista commented 1 month ago

The intent of https://github.com/sonic-net/sonic-mgmt/pull/11968 was not to introduce bugs. However, there is now a negative interaction between the change and the existing behavior of conditional rules. On one hand a bug was fixed, but on the other hand it broke existing behavior that was built on top buggy code.

To make progress, we need to also update tests_mark_conditions.yaml before introducing incompatible behavior.

arlakshm commented 1 month ago

@wangxin, @yxieca for viz... can we discuss in test subgroup on what the next steps to resolve this issue

yutongzhang-microsoft commented 1 month ago

Hi, @arista-nwolfe, @kenneth-arista, sorry for the inconvenience.

There was an incorrect behavior in finding the matching entries, deviating from our intended logic before. And in #11968, Arista fixed this issue, which align with our original expectations.

However, this bugfix will cause some inconsistency behavior with previous logic(unexpected because of the bug). To keep consistency, I raised PR #14912 and it will be cherry picked into all branches so the behaviour will be align with previous.

And also, we have the new logic of conditional mark #14395, #14515, we will also cherry pick these two PRs into all branches. After all PRs being cherry picked, the behavior will align with our expectations and previous.

arista-nwolfe commented 1 month ago

Hi, @arista-nwolfe, @kenneth-arista, sorry for the inconvenience.

There was an incorrect behavior in finding the matching entries, deviating from our intended logic before. And in #11968, Arista fixed this issue, which align with our original expectations.

However, this bugfix will cause some inconsistency behavior with previous logic(unexpected because of the bug). To keep consistency, I raised PR #14912 and it will be cherry picked into all branches so the behaviour will be align with previous.

And also, we have the new logic of conditional mark #14395, #14515, we will also cherry pick these two PRs into all branches. After all PRs being cherry picked, the behavior will align with our expectations and previous.

Sorry @yutongzhang-microsoft I didn't see this in time, I left a comment in the submitted review: https://github.com/sonic-net/sonic-mgmt/pull/14395#discussion_r1796203406

yutongzhang-microsoft commented 4 weeks ago

Hi, @arista-nwolfe , I have cherry picked the new logic of conditional mark into 202405. Could you please help to review?

15047

arista-nwolfe commented 4 weeks ago

Hi, @arista-nwolfe , I have cherry picked the new logic of conditional mark into 202405. Could you please help to review? #15047

LGTM, thanks for patching

yutongzhang-microsoft commented 3 weeks ago

Can we close this issue?