sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
728 stars 1.4k forks source link

[sflow] system crashed once sflow is enabled and switch has 200G+ interfaces #6793

Open Hedgehog-Guru opened 3 years ago

Hedgehog-Guru commented 3 years ago

Description

If switch has 200G and above interfaces system crash occur after sflow was enabled

Steps to reproduce the issue:

  1. Enable sflow feature
    config feature state sflow enabled
  2. Make sure at least one interface is oper-up and has 200G or above speed
    config interface speed Ethernet24 200000
    show interfaces status Ethernet24
    Interface        Lanes    Speed    MTU    FEC    Alias    Vlan    Oper    Admin             Type    Asym PFC
    -----------  -----------  -------  -----  -----  -------  ------  ------  -------  ---------------  ----------
    Ethernet24  24,25,26,27     200G   9100    N/A     etp7  routed    down       up  QSFP28 or later         N/A
  3. Enable sflow
    config sflow enable 
  4. Check system health for example by "pgrep orchagent"

Describe the results you received:

System crashed

Describe the results you expected:

Stable run

Output of show version:

SONiC Software Version: SONiC.SONIC.202012.10-d26a4af_Internal
Distribution: Debian 10.7
Kernel: 4.19.0-9-2-amd64
Build commit: d26a4aff
Build date: Thu Feb  4 15:28:36 UTC 2021
Built by: sw-r2d2-bot@r-build-sonic-ci02

Platform: x86_64-mlnx_msn3700-r0
HwSKU: ACS-MSN3700
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1852X03965
Uptime: 17:46:35 up 12 min,  1 user,  load average: 0.07, 0.84, 0.72
[sonic_dump_qa-anconda-test10_20210216_173758.tar.gz](https://github.com/Azure/sonic-buildimage/files/5989890/sonic_dump_qa-anconda-test10_20210216_173758.tar.gz)

Additional information you deem important (e.g. issue happens only occasionally):

sonic_dump_qa-anconda-test10_20210216_173758.tar.gz

prsunny commented 3 years ago

@padmanarayana , @dgsudharsan , could you please take a look and suggest next steps?

anshuv-mfst commented 3 years ago

Issue Triage 2/17: Dell team to provide input on the issue, thanks!

liat-grozovik commented 3 years ago

@padmanarayana kindly reminder

padmanarayana commented 3 years ago

@liat-grozovik : the dump is from an Internal build. Nevertheless, it is very likely that the 200G is failing because there is no entry in either https://github.com/Azure/sonic-swss/blob/288fb40d8ff4ec825645c2fbab1e79f50881a9f2/cfgmgr/sflowmgr.cpp#L13 or https://github.com/Azure/sonic-swss/blob/288fb40d8ff4ec825645c2fbab1e79f50881a9f2/cfgmgr/sflowmgr.h#L14. We'll check and get back.

GarrickHe commented 3 years ago

@Hedgehog-Guru - We don't have a 200G interface. Can we provide a patch and you build and re-test on your end?

Thanks, Garrick

liat-grozovik commented 3 years ago

Please share draft PR and Nvidia will be able to validate.

From: Garrick He notifications@github.com Sent: Monday, March 8, 2021 2:48 AM To: Azure/sonic-buildimage sonic-buildimage@noreply.github.com Cc: Liat Grozovik liatg@nvidia.com; Mention mention@noreply.github.com Subject: Re: [Azure/sonic-buildimage] [sflow] system crashed once sflow is enabled and switch has 200G+ interfaces (#6793)

@Hedgehog-Guruhttps://github.com/Hedgehog-Guru - We don't have a 200G interface. Can we provide a patch and you build and re-test on your end?

Thanks, Garrick

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/Azure/sonic-buildimage/issues/6793#issuecomment-792391621, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKTABA6FT37OYIVXNTLKJNTTCQNCNANCNFSM4XWWO2RQ.

vadymhlushko-mlnx commented 3 years ago

@GarrickHe kind reminder, is there are any updates?

vivekrnv commented 3 years ago

This issue can be closed.