Open pavannaregundi opened 4 months ago
@dgsudharsan to start a offline discussion on the change needed in Sairedis.
this PR https://github.com/sonic-net/sonic-swss/pull/3076 has the fix for this issue. Please retest with latest master image.
this PR sonic-net/sonic-swss#3076 has the fix for this issue. Please retest with latest master image.
@arlakshm Thanks for your comment. Using following master commit: https://github.com/sonic-net/sonic-buildimage/tree/a7ab698f1c7218b4ddc4db63c42918a8c3eb9eb4 I see that above PR is already part of this master commit.
@pavannaregundi From the internally attached PR I see the backref was missing and hence the queue removal didn't happen in the first place. The PR 3076 in SWSS addresses a different race condition which is a statistical issue. I believe we need the yang fix that is linked to this bug.
@pavannaregundi From the internally attached PR I see the backref was missing and hence the queue removal didn't happen in the first place. The PR 3076 in SWSS addresses a different race condition which is a statistical issue. I believe we need the yang fix that is linked to this bug.
I had directly patched changes to /usr/local/yang-models/sonic-buffer-queue.yang in sonic switch and tried that change. It did not work either. So internally we are still checking it.
@pavannaregundi From the internally attached PR I see the backref was missing and hence the queue removal didn't happen in the first place. The PR 3076 in SWSS addresses a different race condition which is a statistical issue. I believe we need the yang fix that is linked to this bug.
I had directly patched changes to /usr/local/yang-models/sonic-buffer-queue.yang in sonic switch and tried that change. It did not work either. So internally we are still checking it.
can you try configuring create_only_config_db_buffers
in DEVICE_METADATA|localhost
? I think it should work with it configured
currently, DPB doesn't remove queue/PG counters after the port is removed if it is not configured.
@pavannaregundi From the internally attached PR I see the backref was missing and hence the queue removal didn't happen in the first place. The PR 3076 in SWSS addresses a different race condition which is a statistical issue. I believe we need the yang fix that is linked to this bug.
I had directly patched changes to /usr/local/yang-models/sonic-buffer-queue.yang in sonic switch and tried that change. It did not work either. So internally we are still checking it.
can you try configuring
create_only_config_db_buffers
inDEVICE_METADATA|localhost
? I think it should work with it configured currently, DPB doesn't remove queue/PG counters after the port is removed if it is not configured.
@stephenxs Thanks. We will try this and get back.
@stephenxs Adding create_only_config_db_buffers.json is working. However I am not sure if this is how it is supposed to work. In general if a port is removed from ASIC DB, its FLEX_COUNTER entry should also get removed. Also, is there any other implications of using 'create_only_config_db_buffers'?
@stephenxs Adding create_only_config_db_buffers.json is working. However I am not sure if this is how it is supposed to work. In general if a port is removed from ASIC DB, its FLEX_COUNTER entry should also get removed. Also, is there any other implications of using 'create_only_config_db_buffers'?
Hi
The orchagent should remove the PG, queue counters when a port is removed. I think it is a missing logic in the DPB feature.
We fixed it partially when create_only_config_db_buffers
is true
when we were fixing another issue.
But in general, we should expect it to be fixed by the owner of DPB especially when the flag is not set.
When create_only_config_db_buffers
is set, it only create counters for queues/PGs that are configured in BUFFER_PG and BUFFER_QUEUE tables.
Description
Errors are seen in SDK reading the Queue and Buffer counters for deleted ports after dynamic breakout CLI execution.
Steps to reproduce the issue:
Collect redis db dumps for reference
Run DPB which removed more ports than it creates. Example below shows converting from 4x100G to 1x400G.
Collect redis db dumps again
check syslogs for errors.
Describe the results you received:
From the collected logs in redis-dump,
redis_flex_after_breakout.txt: Stale entries present in FLEX_DB for queues mapped to deleted port.
Describe the results you expected:
Output of
show version
:port_breakout_info.txt redis_asic_after_breakout.txt redis_asic_before_breakout.txt redis_flex_after_breakout.txt redis_flex_before_breakout.txt syslog.txt
Output of
show techsupport
:Additional information you deem important (e.g. issue happens only occasionally):