sonic-net / sonic-buildimage

Scripts which perform an installable binary image build for SONiC
Other
743 stars 1.43k forks source link

[PDDF] PSU fan status is always NOT OK while thermalctld is enabled #8129

Open seanwu-ec opened 3 years ago

seanwu-ec commented 3 years ago

Description

PSU Fan status is always NOT OK while pmon's thermalctld is enabled. As below:

admin@sonic:~$ show platform fan
  Drawer    LED         FAN               Speed    Direction    Presence    Status          Timestamp
--------  -----  ----------  ------------------  -----------  ----------  --------  -----------------
     N/A  green   PSU1_FAN1  77.15555555555555%      exhaust     Present    Not OK  20210617 18:45:49
     N/A  green   PSU2_FAN1  76.08888888888889%      exhaust     Present    Not OK  20210617 18:45:49

Suggestion for change

While it is PSU fan, PddfFan.get_target_speed() should raise NotImplementedError instead of returning 0. https://github.com/Azure/sonic-buildimage/blob/4f2bc1fbeddc49af62c8f1acb748e251d043e792/platform/pddf/platform-api-pddf-base/sonic_platform_pddf_base/pddf_fan.py#L227 Otherwise, PSU fan will fail the over_speed check all the time while the real speed is much greater than 0% https://github.com/Azure/sonic-platform-daemons/blob/2d2749ab77ea0cfb9b1a9a0a5c7eeffbde9daed8/sonic-thermalctld/scripts/thermalctld#L349

Steps to reproduce the issue:

  1. All PSUs are well plugged and powered.
  2. Make sure thermalctld in pmon container is running. (Or invoke it manually: python3 /usr/local/bin/thermalctld)
  3. Type cmd show platform fan. You will see PSU Fan status is Not OK.

Describe the results you received:

PSU fan status should be 'OK'

Describe the results you expected:

PSU fan status is 'Not OK'

Output of show version:

SONiC Software Version: SONiC.master-8115.22792-d40be3086
Distribution: Debian 10.10
Kernel: 4.19.0-12-2-amd64
Build commit: d40be3086
Build date: Wed Jul  7 07:58:49 UTC 2021
Built by: AzDevOps@sonic-build-workers-000GRR

Platform: x86_64-accton_as9716_32d-r0
HwSKU: Accton-AS9716-32D
ASIC: broadcom
ASIC Count: 1
Serial Number: N/A
Model Number: N/A
Hardware Revision: N/A
Uptime: 18:13:11 up  1:43,  4 users,  load average: 2.18, 1.76, 1.79

Docker images:
REPOSITORY                    TAG                           IMAGE ID            SIZE
docker-platform-monitor       latest                        b8d4aae7ead7        627MB
docker-platform-monitor       master-8115.22792-d40be3086   b8d4aae7ead7        627MB
docker-macsec                 latest                        815692b903ff        427MB
docker-macsec                 master-8115.22792-d40be3086   815692b903ff        427MB
docker-teamd                  latest                        13c4073538c7        424MB
docker-teamd                  master-8115.22792-d40be3086   13c4073538c7        424MB
docker-snmp                   latest                        bd3e67e44b70        454MB
docker-snmp                   master-8115.22792-d40be3086   bd3e67e44b70        454MB
docker-database               latest                        9d22b800e462        413MB
docker-database               master-8115.22792-d40be3086   9d22b800e462        413MB
docker-lldp                   latest                        b89a34f2a4e9        453MB
docker-lldp                   master-8115.22792-d40be3086   b89a34f2a4e9        453MB
docker-orchagent              latest                        16bc98c8190f        442MB
docker-orchagent              master-8115.22792-d40be3086   16bc98c8190f        442MB
docker-nat                    latest                        9fc5997ea17c        427MB
docker-nat                    master-8115.22792-d40be3086   9fc5997ea17c        427MB
docker-sonic-mgmt-framework   latest                        704e6ec89696        570MB
docker-sonic-mgmt-framework   master-8115.22792-d40be3086   704e6ec89696        570MB
docker-sonic-telemetry        latest                        a2946d1dcd84        501MB
docker-sonic-telemetry        master-8115.22792-d40be3086   a2946d1dcd84        501MB
docker-dhcp-relay             latest                        7a3c6b47ce19        420MB
docker-dhcp-relay             master-8115.22792-d40be3086   7a3c6b47ce19        420MB
docker-fpm-frr                latest                        49377a9cfebf        442MB
docker-fpm-frr                master-8115.22792-d40be3086   49377a9cfebf        442MB
docker-sflow                  latest                        f14bcfdaa9a0        425MB
docker-sflow                  master-8115.22792-d40be3086   f14bcfdaa9a0        425MB
docker-router-advertiser      latest                        3322539bfe10        413MB
docker-router-advertiser      master-8115.22792-d40be3086   3322539bfe10        413MB
docker-syncd-brcm             latest                        6ad9b367a389        705MB
docker-syncd-brcm             master-8115.22792-d40be3086   6ad9b367a389        705MB
zhangyanzhao commented 3 years ago

@adyeung will take a look

FuzailBrcm commented 3 years ago

@seanwu-ec Thanks for raising this. Your suggestion seems correct but I need to test some more as we didn't enable thermalctld locally (or enabled it with some restrictions). I will work on it and push the fix.

seanwu-ec commented 3 years ago

Understood. I appreciate that, @FuzailBrcm. If you know any downsides or reasons that we should not enable thermalctld, please kindly let us know. Recently we are enabling it back because some customers complained show platform fan/temperature doesn't work.

FuzailBrcm commented 3 years ago

Added the fix for this issue as part of https://github.com/Azure/sonic-buildimage/pull/7834