sonic-net / sonic-mgmt

Configuration management examples for SONiC
Other
196 stars 718 forks source link

TestAutoTechSupport::test_max_limit failure after merge PR#8181 #8367

Open XuChen-MSFT opened 1 year ago

XuChen-MSFT commented 1 year ago

Description

After mereged PR https://github.com/sonic-net/sonic-mgmt/pull/8181, on some other sonic device, hit TestAutoTechSupport::test_max_limit failure, as below:

    def validate_files_in_folder(validation_files_list, files_list, expected_files=True):
        """
        Validated files in folder
        :param validation_files_list: actual number of files in folder(list)
        :param files_list: list of files which we will check in folder
        :param expected_files: if True - check that file in folder, if False - check that file not in folder
        """
        with allure.step('Validate files in folder'):
            for stub_file in files_list:
                if expected_files:
                    err_mgs = 'Expected file: {} not found in available files list: {}'.format(stub_file,
                                                                                               validation_files_list)
                    assert stub_file in validation_files_list, err_mgs
                else:
                    err_msg = 'Unexpected file: {} found in available files list: {}'.format(stub_file,
                                                                                             validation_files_list)
>                   assert stub_file not in validation_files_list, err_msg
E                   AssertionError: Unexpected file: bash.1684423533.12278.core.gz found in available files list: [u'bash.1684423533.12278.core.gz', u'bash.1684423534.7162.core.gz', u'bash.1684423535.18639.core.gz', u'bash.1684423538.976.core.gz', u'bash.1684423667.1002.core.gz']

err_msg    = "Unexpected file: bash.1684423533.12278.core.gz found in available files list: [u'bash.1684423533.12278.core.gz', u'ba...3534.7162.core.gz', u'bash.1684423535.18639.core.gz', u'bash.1684423538.976.core.gz', u'bash.1684423667.1002.core.gz']"
expected_files = False
files_list = ['bash.1684423532.17296.core.gz', 'bash.1684423533.12278.core.gz']
stub_file  = 'bash.1684423533.12278.core.gz'
validation_files_list = [u'bash.1684423533.12278.core.gz', u'bash.1684423534.7162.core.gz', u'bash.1684423535.18639.core.gz', u'bash.1684423538.976.core.gz', u'bash.1684423667.1002.core.gz']

show_techsupport/test_auto_techsupport.py:1215: AssertionError
- generated xml file: /home/xuchen3/env-a/sonic-mgmt-int/tests/logs/show_techsupport/test_auto_techsupport.py::TestAutoTechSupport::test_max_limit.xml -
=========================== short test summary info ============================
FAILED show_techsupport/test_auto_techsupport.py::TestAutoTechSupport::test_max_limit[core]
========================== 1 failed in 667.32 seconds ==========================

Steps to reproduce the issue:

  1. run "show_techsupport/test_auto_techsupport.py::TestAutoTechSupport::test_max_limit" case

Describe the results you received:

failed with error messag:

           with allure.step('Check that all expected stub files exist and unexpected does not exist'):
                expected_max_usage = one_percent_in_mb * max_limit
                expected_stub_files = dummy_files_list[2:]
                not_expected_stub_files = dummy_files_list[:2]
                # import pdb; pdb.set_trace()
>               validate_expected_stub_files(self.duthost, validation_folder, expected_stub_files, expected_number_of_additional_files=2, not_expected_stub_files_list=not_expected_stub_files, expected_max_folder_size=expected_max_usage)

avail      = 19546
cleanup_list = [(<function set_auto_techsupport_global at 0x7fb80d43c1d0>, (<MultiAsicSonicHost str3-8101-01>,), {'core_limit': 5})]
dummy_file_generator = <function create_core_stub_file at 0x7fb80d43ce50>
dummy_files_list = ['bash.1684423532.17296.core.gz', 'bash.1684423533.12278.core.gz', 'bash.1684423534.7162.core.gz', 'bash.1684423535.18639.core.gz']
expected_core_file = u'bash.1684423667.1002.core.gz'
expected_file_size_in_mb = 320
expected_max_usage = 960
expected_stub_files = ['bash.1684423534.7162.core.gz', 'bash.1684423535.18639.core.gz']
feature_rate_limit_zero = None
global_rate_limit_zero = None
max_limit  = 3
not_expected_stub_files = ['bash.1684423532.17296.core.gz', 'bash.1684423533.12278.core.gz']
num_of_dummy_files = 4
one_file_size_in_percent = 1
one_percent_in_mb = 320
self       = <test_auto_techsupport.TestAutoTechSupport instance at 0x7fb80cea0320>
stub_file  = 3
test_mode  = 'core'
test_mode_folder_dict = {'core': '/var/core/', 'techsupport': '/var/dump/'}
total      = 32077
used       = 12515
used_percent = 40
validation_folder = '/var/core/'

Describe the results you expected:

pass this case

Additional information you deem important:

my initial analysis:

  1. disk info

    $ df /
    Filesystem     1K-blocks     Used Available Use% Mounted on
    root-overlay    32847824 12814896  20016544  40% /
  2. when max-core-limit == 0, script generated 4 dummy corefile and 1 real corefile, as below

    $ ls -lstr /var/core
    'total 1310784', 
    '-rw-r--r-- 1 root root 335544320 May 18 15:25 bash.1684423532.17296.core.gz', 
    '-rw-r--r-- 1 root root 335544320 May 18 15:25 bash.1684423533.12278.core.gz', 
    '-rw-r--r-- 1 root root 335544320 May 18 15:25 bash.1684423534.7162.core.gz', 
    '-rw-r--r-- 1 root root 335544320 May 18 15:25 bash.1684423535.18639.core.gz', 
    '-rw-r--r-- 1 root root     51177 May 18 15:25 bash.1684423538.976.core.gz'

so far, the disk utilization for corefile is : 4%

(335544320*4+51177)/(32847824*1024)=0.0399 
  1. after change max-core-limit to 3, and continue to execute some test behaviors, the corefiles in /var/core folder changes as below:
    $ ls -lstr /var/core
    'total 983152', 
    'rw-r--r-- 1 root root 335544320 May 18 15:25 bash.1684423533.12278.core.gz', 
    'rw-r--r-- 1 root root 335544320 May 18 15:25 bash.1684423534.7162.core.gz', 
    'rw-r--r-- 1 root root 335544320 May 18 15:25 bash.1684423535.18639.core.gz', 
    '-rw-r--r-- 1 root root     51177 May 18 15:25 bash.1684423538.976.core.gz', 
    '-rw-r--r-- 1 root root     51378 May 18 15:27 bash.1684423667.1002.core.gz']

the disk utilization for core folder is 3%, and

(335544320*3+51177+51378)/(32847824*1024)=0.0299

and observed that "bash.1684423532.17296.core.gz" was removed to make sure meet criterion of "max-core-limit == 3". and also observed that "bash.1684423533.12278.core.gz" was keep, because already to meet criterion of "max-core-limit == 3".

  1. in script, hardcode the rule of getting not_expected_stub_files and expected_stub_files, as below:
            with allure.step('Check that all expected stub files exist and unexpected does not exist'):
                expected_max_usage = one_percent_in_mb * max_limit
                expected_stub_files = dummy_files_list[2:]        
                not_expected_stub_files = dummy_files_list[:2]    <<<<<<  always remove first 2 corefile
                validate_expected_stub_files(self.duthost, validation_folder, expected_stub_files,

    so not_expected_stub_files value is: ['bash.1684423532.17296.core.gz', 'bash.1684423533.12278.core.gz'] and then failed to check not_expected_stub_files.

looks like need to dynamically calculate whichi corefile is needed to remove instead of removing first two.

XuChen-MSFT commented 1 year ago

@bingwang-ms PR# https://github.com/sonic-net/sonic-mgmt/pull/8181 was imported by mellanox platform, can you take a look?

bingwang-ms commented 1 year ago

@ppikh Can you help take a look?

ppikh commented 1 year ago

This issue should be fixed by https://github.com/sonic-net/sonic-mgmt/pull/8086. I marked in PR backport to 202205, but seems it was not backported. @bingwang-ms - should I create a new PR for backport changes into 202205 branch?

StormLiangMS commented 1 year ago

@ppikh I added the label, Xin will help to cherry-pick, no need for the new PR unless there is a conflict.