sonic-net / sonic-mgmt

Configuration management examples for SONiC
Other
200 stars 732 forks source link

[202405]: `bgp/test_reliable_tsa.py` fails with `TypeError: '<' not supported between instances of 'int' and 'NoneType'` #15513

Open arista-nwolfe opened 2 weeks ago

arista-nwolfe commented 2 weeks ago

Issue Description

Looks like bgp/test_reliable_tsa.py was added recently in https://github.com/sonic-net/sonic-mgmt/pull/13290

Looks like the timeout in the wait_until is passed as None

            # Once all line cards are in maintenance state, proceed further
            for linecard in duthosts.frontend_nodes:
                # Verify startup_tsa_tsb service stopped after expected time
&gt;               pytest_assert(wait_until(tsa_tsb_timer[linecard], 20, 0, get_tsa_tsb_service_status, linecard, 'exited'),
                              "startup_tsa_tsb service is not stopped even after configured timer expiry")

tsa_tsb_timer = {&lt;MultiAsicSonicHost cmp206-4&gt;: None, &lt;MultiAsicSonicHost cmp206-5&gt;: None, &lt;MultiAsicSonicHost cmp206-6&gt;: None}
    def wait_until(timeout, interval, delay, condition, *args, **kwargs):
        """
        ...
&gt;       while elapsed_time &lt; timeout:
E       TypeError: '&lt;' not supported between instances of 'int' and 'NoneType'

args       = (&lt;MultiAsicSonicHost cmp206-4&gt;, 'exited')
condition  = &lt;function get_tsa_tsb_service_status at 0x7f2533203940&gt;
delay      = 0
elapsed_time = 0
interval   = 20
kwargs     = {}
start_time = 1731149328.9497356
timeout    = None

Also worth noting that this bgp/test_reliable_tsa.py seems to take a long time to run and generates a massive log file (~30GB) -rw-r--r-- 1 nwolfe nwolfe 30959840466 Nov 12 17:24 logs/bgp/test_reliable_tsa.log

Results you see

Test passing None as the timeout parameter to wait_until

Results you expected to see

Test should pass an integer value for timeout

Is it platform specific

generic

Relevant log output

No response

Output of show version

No response

Attach files (if any)

No response

Javier-Tan commented 2 weeks ago

Hi @arista-nwolfe, having a look around this should be caught by:

        tsa_tsb_timer[linecard] = get_startup_tsb_timer(linecard)
        ...
        if not tsa_tsb_timer[linecard]:
            pytest.skip("startup_tsa_tsb.service is not supported on the duts under {}".format(suphost.hostname))

Does this happen with test_startup_tsa_tsb_service.py?

arista-nwolfe commented 2 weeks ago

Hi @arista-nwolfe, having a look around this should be caught by:

        tsa_tsb_timer[linecard] = get_startup_tsb_timer(linecard)
        ...
        if not tsa_tsb_timer[linecard]:
            pytest.skip("startup_tsa_tsb.service is not supported on the duts under {}".format(suphost.hostname))

Does this happen with test_startup_tsa_tsb_service.py?

Hi @Javier-Tan it looks like those checks are only in test_startup_tsa_tsb_service.py I'm only seeing this issue in test_reliable_tsa.py which doesn't appear to make any such checks before using the value. E.G. https://github.com/sonic-net/sonic-mgmt/blob/202405/tests/bgp/test_reliable_tsa.py#L782

def test_sup_tsa_act_with_sup_reboot(duthosts, localhost, enum_supervisor_dut_hostname,
                                     enable_disable_startup_tsa_tsb_service,                # noqa: F811
                                     nbrhosts, traffic_shift_community, tbinfo):
...
    tsa_tsb_timer = dict()
...
    for linecard in duthosts.frontend_nodes:
        tsa_tsb_timer[linecard] = get_startup_tsb_timer(linecard)
...
        for linecard in duthosts.frontend_nodes:
            # Verify startup_tsa_tsb service stopped after expected time
            pytest_assert(wait_until(tsa_tsb_timer[linecard], 20, 0, get_tsa_tsb_service_status, linecard, 'exited'),
                          "startup_tsa_tsb service is not stopped even after configured timer expiry")
Javier-Tan commented 2 weeks ago

@arista-nwolfe my mistake, I had both open and mixed them up, I'm happy to implement that safeguard to avoid this issue?

arista-nwolfe commented 2 weeks ago

@arista-nwolfe my mistake, I had both open and mixed them up, I'm happy to implement that safeguard to avoid this issue?

No worries, yeah that would be great, I guess one question would be whether or not the test test_reliable_tsa.py should be skipped if the startup-tsa-tsb.conf file is missing from the SKU?

If so then it would make sense to put that guard directly into the function get_startup_tsb_timer right rather than pushing it to the caller of the function?

Javier-Tan commented 2 weeks ago

@arista-nwolfe my mistake, I had both open and mixed them up, I'm happy to implement that safeguard to avoid this issue?

No worries, yeah that would be great, I guess one question would be whether or not the test test_reliable_tsa.py should be skipped if the startup-tsa-tsb.conf file is missing from the SKU?

If so then it would make sense to put that guard directly into the function get_startup_tsb_timer right rather than pushing it to the caller of the function?

Yea that sounds like the best way to do this, will implement a guard in the get_startup_tsb_timer so we don't rely on per test guards

Javier-Tan commented 2 weeks ago

Will be fixed by https://github.com/sonic-net/sonic-buildimage/pull/20804, will change previous PR to fail T2 devices where we can't find startup_tsa_tsb_timer value