sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
741 stars 1.44k forks source link

[featured] after changing feature state, featured adds attributes to FEATURE table, without checking if the feature is still installed #19878

Open noaOrMlnx opened 3 months ago

noaOrMlnx commented 3 months ago

Description

In featured, sync_feature_scope() function, if feature state is changed, the function will set "has_per_asic_scope" and "has_global_scope" attributes to FEATURE table in CONFIG_DB. the issue happens when we change the feature state (e.g. to "disabled"), and then uninstall the feature. uninstall the feature will remove the table from config db, and the above function will set those attributes. The outcome is when we do "show feature status" - the uninstalled feature will be an entry in the table, but there will be no state/autorestart there: (cpu-report)

Feature             State            AutoRestart     SetOwner
------------------  ---------------  --------------  ----------
bgp                 enabled          disabled
cpu-report
database            enabled          always_enabled
127.0.0.1:6379[4]> hgetall FEATURE|cpu-report
1) "has_per_asic_scope"
2) "False"
3) "has_global_scope"
4) "True"

Steps to reproduce the issue:

  1. install feature using sonic-package-manager
  2. enable it
  3. disable it
  4. uninstall feature The issue reproduced ~1/5 times

Describe the results you received:

the feature has entry in feature table after uninstalling it

Describe the results you expected:

table should be clean

Output of show version:

SONiC Software Version: SONiC.202405.3-658f752aa_Internal
SONiC OS Version: 12
Distribution: Debian 12.6
Kernel: 6.1.0-11-2-amd64
Build commit: 658f752aa
Build date: Sun Aug  4 21:02:39 UTC 2024
Built by: sw-r2d2-bot@r-build-sonic-ci03-241

Platform: x86_64-mlnx_msn2410-r0
HwSKU: ACS-MSN2410
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1921X01546
Model Number: MSN2410-CB2FO
Hardware Revision: B4
Uptime: 11:13:43 up 20:39,  1 user,  load average: 0.98, 1.06, 1.19
Date: Sun 11 Aug 2024 11:13:43

Docker images:
REPOSITORY                                            TAG                           IMAGE ID       SIZE
docker-platform-monitor                               202405.3-658f752aa_Internal   11ae435f6f62   610MB
docker-platform-monitor                               latest                        11ae435f6f62   610MB
docker-syncd-mlnx                                     202405.3-658f752aa_Internal   b2db4c35bf1e   841MB
docker-syncd-mlnx                                     latest                        b2db4c35bf1e   841MB
docker-orchagent                                      202405.3-658f752aa_Internal   8666012a65e7   356MB
docker-orchagent                                      latest                        8666012a65e7   356MB
docker-dhcp-relay                                     latest                        9651d6a0c8f9   324MB
docker-teamd                                          202405.3-658f752aa_Internal   269688054997   343MB
docker-teamd                                          latest                        269688054997   343MB
docker-macsec                                         latest                        8c924d1a0c9e   345MB
docker-sflow                                          202405.3-658f752aa_Internal   598d86b28431   344MB
docker-sflow                                          latest                        598d86b28431   344MB
docker-fpm-frr                                        202405.3-658f752aa_Internal   04648378f3c3   374MB
docker-fpm-frr                                        latest                        04648378f3c3   374MB
docker-snmp                                           202405.3-658f752aa_Internal   5492549b5f9a   354MB
docker-snmp                                           latest                        5492549b5f9a   354MB
docker-sonic-mgmt-framework                           202405.3-658f752aa_Internal   cea7ea147ae4   401MB
docker-sonic-mgmt-framework                           latest                        cea7ea147ae4   401MB
docker-nat                                            202405.3-658f752aa_Internal   619155780dc5   345MB
docker-nat                                            latest                        619155780dc5   345MB
docker-database                                       202405.3-658f752aa_Internal   72799f5b16df   323MB
docker-database                                       latest                        72799f5b16df   323MB
docker-mux                                            202405.3-658f752aa_Internal   fc82c042c403   366MB
docker-mux                                            latest                        fc82c042c403   366MB
docker-router-advertiser                              202405.3-658f752aa_Internal   b906966c8853   314MB
docker-router-advertiser                              latest                        b906966c8853   314MB
docker-sonic-gnmi                                     202405.3-658f752aa_Internal   9e49b47c4df4   399MB
docker-sonic-gnmi                                     latest                        9e49b47c4df4   399MB
docker-lldp                                           202405.3-658f752aa_Internal   783542c6b485   360MB
docker-lldp                                           latest                        783542c6b485   360MB
docker-eventd                                         202405.3-658f752aa_Internal   087f94960749   314MB
docker-eventd                                         latest                        087f94960749   314MB

Output of show techsupport:

(paste your output here or download and attach the file here )

Additional information you deem important (e.g. issue happens only occasionally):

zjswhhh commented 3 months ago

@qiluo-msft - can you follow up with NIVIDA? Is it caused by package manager or featured?

dprital commented 2 months ago

@qiluo-msft - can you follow up with NIVIDA? Is it caused by package manager or featured?

This degradation was rasied due to PR: https://github.com/sonic-net/sonic-host-services/pull/120

abdosi commented 2 months ago

do we have UT to test this case ? Without UT it will be hard to prevent such degradation as uninstalling the feature is not typical use-case. Can you please add UT and then we can see best way to fix it.

noaOrMlnx commented 2 months ago

@abdosi We are before release, so I won't have time to add UT to a feature I didn't add. Please check the community issue description. you can reproduce the issue pretty easily.

++ @moshemos @dprital

stepanblyschak commented 1 month ago

@abdosi Recently MSFT reported that COFIG_DB shouldn't be updated at runtime - https://github.com/sonic-net/sonic-buildimage/issues/20462 and this issue is related to that, do you have an ETA for a fix ?

dprital commented 1 week ago

@abdosi @bingwang-ms , Any ETA to fix this issue ?