kytos-ng / kytos-end-to-end-tests

Kytos End-to-End Tests
0 stars 10 forks source link

Increased `CONSISTENCY_MIN_VERDICT_INTERVAL` and adapted tests #178

Closed viniarck closed 1 year ago

viniarck commented 1 year ago

Fixes #172

= 164 passed, 2 skipped, 18 xfailed, 7 xpassed, 758 warnings in 10140.69s (2:49:00) =

gretelliz commented 1 year ago

Hello @viniarck. That seems very good to me. Just a question:

viniarck commented 1 year ago

Hello @viniarck. That seems very good to me. Just a question:

  • CONSISTENCY_MIN_VERDICT_INTERVAL decreases from 60*2 to 60 instead of increasing, right? Same for STATS_INTERVAL, it went from 60 to 30.

@gretelliz, that's correct if you're comparing with the default values of of_core and flow_manager. The rationale, here is to try to use a value that shouldn't be too low and/or close to the default one (or a value that is reasonable for production also). We could also use the default values, however, initially I set them a bit lower but not too low to try to also allow for these parameters to be individually tested in new additional tests, since prior to this PR having a lower value was also testing exercising more variables but then we also had other false positives that we're trying to find a balance.

gretelliz commented 1 year ago

I see. Thanks @viniarck, got it.

viniarck commented 1 year ago

I see. Thanks @viniarck, got it.

Thanks for your review @gretelliz.

I'll add more test results executions here in the next days as we've discussed on our weekly meeting.

gretelliz commented 1 year ago

Fine @viniarck , I was thinking about this. It will be great. I approved this PR, but I think what Italo commented is important. I am actually working with the parameter CONSISTENCY_MIN_VERDICT_INTERVAL for a issue of mef_eline.

viniarck commented 1 year ago

Here's a summary of test executions on GitLab CI:

iter 100
+ for i in {1..150}
+ echo 'iter 150'
+ python3 -m pytest tests/ --timeout=300 -k test_031_on_switch_restart_kytos_should_recreate_flows
============================= test session starts ==============================
collected 191 items / 190 deselected / 1 selected

tests/test_e2e_21_flow_manager.py .                                      [100%]
========== 1 passed, 190 deselected, 34 warnings in 92.72s (0:01:32) ===========
+ tail -f

Conclusions so far:

Suggestions:

1) Stick with the higher CONSISTENCY_MIN_VERDICT_INTERVAL to minimize the variables that can result in false positives, especially since successful rate of test executions has improved, keep using the reconnect_switches method to still be compatible with the existing time.sleeps. Map new test cases to test out exclusively the consistency check, and then only start trying to reduce its value when e2e nightly tests are rarely falling (maybe at most every 15 days?) 2) Introduce pytest-rerunfailures when assert errors happen, but still showing in the reports that it retried the test, that way, potential flaky tests or non-deterministic bugs we can have more evidence of the situation, and the team can still keep observing this, collecting more logs and eventually fix any remaining issues. Retries on tests aren't ideal, but all things considered, I believe it'll be beneficial in terms of effort to analyze and fix whatever needs to be fixed in the future.

I've left more seven e2e test executions in the CI with pytest-rerunfailures, let's see how it goes. I'll post another update once I have results. If you have other suggestions let me know cc'ing @italovalcy @gretelliz.

viniarck commented 1 year ago

I've left more seven e2e test executions in the CI with pytest-rerunfailures, let's see how it goes. I'll post another update once I have results. If you have other suggestions let me know cc'ing @italovalcy @gretelliz.

Here's a follow up with yesterday's seven CI executions, they all passed, and in total there were two test cases rerun:

20221110_140553

For the re-runs I'll explore now other formatting options since we need the failed assertions here as well

=========================== rerun test summary info ============================
RERUN tests/test_e2e_20_flow_manager.py::TestE2EFlowManager::test_040_replace_action_flow
= 164 passed, 2 skipped, 18 xfailed, 7 xpassed, 758 warnings, 1 rerun in 10203.85s (2:50:03) =

=========================== rerun test summary info ============================
RERUN tests/test_e2e_20_flow_manager.py::TestE2EFlowManager::test_050_add_action_flow
= 164 passed, 2 skipped, 18 xfailed, 7 xpassed, 758 warnings, 1 rerun in 10195.23s (2:49:55) =
viniarck commented 1 year ago

I've run more seven CI executions, they all passed, in total there were only 2 reruns:

20221115_173409

Here's an example of rerun, I've updated the conftest report hook to also report the errors when it reruns:

------------------------------- start/stop times -------------------------------
rerun: 0
tests/test_e2e_20_flow_manager.py::TestE2EFlowManager::test_030_modify_match: 2022-11-15,01:13:44.375720 - 2022-11-15,01:14:09.344504
self = <tests.test_e2e_20_flow_manager.TestE2EFlowManager object at 0x7fe21d3cfb20>
    def test_030_modify_match(self):
>       self.modify_match()
tests/test_e2e_20_flow_manager.py:384: 

        flows_s1 = s1.dpctl('dump-flows')
>       assert len(flows_s1.split('\r\n ')) == BASIC_FLOWS + 1, flows_s1
E       AssertionError:  cookie=0xab00000000000001, duration=14.147s, table=0, n_packets=8, n_bytes=336, priority=1000,dl_vlan=3799,dl_type=0x88cc actions=CONTROLLER:65535
E         
E       assert 1 == (3 + 1)
E        +  where 1 = len([' cookie=0xab00000000000001, duration=14.147s, table=0, n_packets=8, n_bytes=336, priority=1000,dl_vlan=3799,dl_type=0x88cc actions=CONTROLLER:65535\r\n'])
E        +    where [' cookie=0xab00000000000001, duration=14.147s, table=0, n_packets=8, n_bytes=336, priority=1000,dl_vlan=3799,dl_type=0x88cc actions=CONTROLLER:65535\r\n'] = <built-in method split of str object at 0x7fe21c0bab90>('\r\n ')
E        +      where <built-in method split of str object at 0x7fe21c0bab90> = ' cookie=0xab00000000000001, duration=14.147s, table=0, n_packets=8, n_bytes=336, priority=1000,dl_vlan=3799,dl_type=0x88cc actions=CONTROLLER:65535\r\n'.split
tests/test_e2e_20_flow_manager.py:380: AssertionError
=========================== rerun test summary info ============================
RERUN tests/test_e2e_20_flow_manager.py::TestE2EFlowManager::test_030_modify_match
= 164 passed, 2 skipped, 18 xfailed, 7 xpassed, 758 warnings, 1 rerun in 10179.29s (2:49:39) =
viniarck commented 1 year ago

I'll go ahead and merge this one. I've just synced with Italo about the remaining of the discussion that was open.

Appreciated the reviews.