system76 / firmware-open

System76 Open Firmware
Other
966 stars 84 forks source link

Systems using S0ix don't reach SLP_S0 #506

Open DrymarchonShaun opened 10 months ago

DrymarchonShaun commented 10 months ago

intel's s0ix-selftest-tool is saying the device isn't reaching the deepest state of s0ix, log for that is here.

Manually checking the actual files I'm assuming that script checks I'm finding the same -

$ sudo cat /sys/kernel/debug/pmc_core/slp_s0_residency_usec
0
$ cat /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us
0

however, the CPU is hitting C10 -

$ cat /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us
1115299596
$ sudo cat /sys/kernel/debug/pmc_core/package_cstate_show                      
Package C2 : 434296807
Package C3 : 300541816
Package C6 : 61166
Package C7 : 0
Package C8 : 252830
Package C9 : 0
Package C10 : 105541521

Steps to reproduce

Suspend the system Resume Run sudo cat /sys/kernel/debug/pmc_core/slp_s0_residency_usec

Expected behavior

sudo cat /sys/kernel/debug/pmc_core/slp_s0_residency_usec should return greater than 0

Actual behavior

sudo cat /sys/kernel/debug/pmc_core/slp_s0_residency_usec returns 0

crawfxrd commented 10 months ago

I don't know what's required for SLP_S0#, but I expect a part of it is missing or incorrect RTD3 configs.

DrymarchonShaun commented 10 months ago

I don't know what's required for SLP_S0#, but I expect a part of it is missing or incorrect RTD3 configs.

I assume that would be what's causing the Pcieport is not in D3cold: parts of the selftest-tool's output?

Checking PCI Devices D3 States:
[  309.689726] nvme 0000:2f:00.0: PCI PM: Suspend power state: D0
[  309.689730] nvme 0000:2f:00.0: PCI PM: Skipped
[  309.691814] i801_smbus 0000:00:1f.4: PCI PM: Suspend power state: D0
[  309.691817] i801_smbus 0000:00:1f.4: PCI PM: Skipped
[  309.693959] pcieport 0000:00:1d.0: PCI PM: Suspend power state: D0
[  309.693962] pcieport 0000:00:1d.0: PCI PM: Skipped
[  309.695756] snd_hda_intel 0000:00:1f.3: PCI PM: Suspend power state: D3hot
[  309.695762] i915 0000:00:02.0: PCI PM: Suspend power state: D3hot
[  309.696360] xhci_hcd 0000:00:0d.0: PCI PM: Suspend power state: D3hot
[  309.702096] r8169 0000:2e:00.0: PCI PM: Suspend power state: D3hot
[  309.705955] sdhci-pci 0000:2d:00.0: PCI PM: Suspend power state: D3hot
[  309.706697] nvme 0000:01:00.0: PCI PM: Suspend power state: D3hot
[  309.706773] mei_me 0000:00:16.0: PCI PM: Suspend power state: D3hot
[  309.706825] pcieport 0000:00:1c.0: PCI PM: Suspend power state: D0
[  309.706827] pcieport 0000:00:1c.0: PCI PM: Skipped
[  309.707008] intel-lpss 0000:00:15.0: PCI PM: Suspend power state: D3hot
[  309.707208] xhci_hcd 0000:00:14.0: PCI PM: Suspend power state: D3hot
[  309.707892] iwlwifi 0000:00:14.3: PCI PM: Suspend power state: D3hot
[  309.711623] thunderbolt 0000:00:0d.2: PCI PM: Suspend power state: D3hot
[  309.714532] pcieport 0000:00:1c.7: PCI PM: Suspend power state: D3hot
[  309.740534] pcieport 0000:00:06.0: PCI PM: Suspend power state: D3cold

Checking PCI Devices tree diagram:
-[0000:00]-+-00.0  Intel Corporation Device 4621
           +-02.0  Intel Corporation Alder Lake-P GT2 [Iris Xe Graphics]
           +-06.0-[01]----00.0  Sandisk Corp SanDisk Ultra 3D / WD Blue SN550 NVMe SSD
           +-07.0-[02-2c]--
           +-0a.0  Intel Corporation Platform Monitoring Technology
           +-0d.0  Intel Corporation Alder Lake-P Thunderbolt 4 USB Controller
           +-0d.2  Intel Corporation Alder Lake-P Thunderbolt 4 NHI #0
           +-14.0  Intel Corporation Alder Lake PCH USB 3.2 xHCI Host Controller
           +-14.2  Intel Corporation Alder Lake PCH Shared SRAM
           +-14.3  Intel Corporation Alder Lake-P PCH CNVi WiFi
           +-15.0  Intel Corporation Alder Lake PCH Serial IO I2C Controller #0
           +-15.1  Intel Corporation Alder Lake PCH Serial IO I2C Controller #1
           +-16.0  Intel Corporation Alder Lake PCH HECI Controller
           +-1c.0-[2d]----00.0  O2 Micro, Inc. SD/MMC Card Reader Controller
           +-1c.7-[2e]----00.0  Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
           +-1d.0-[2f]----00.0  Sandisk Corp SanDisk Ultra 3D / WD Blue SN570 NVMe SSD (DRAM-less)
           +-1f.0  Intel Corporation Alder Lake PCH eSPI Controller
           +-1f.3  Intel Corporation Alder Lake PCH-P High Definition Audio Controller
           +-1f.4  Intel Corporation Alder Lake PCH-P SMBus Host Controller
           \-1f.5  Intel Corporation Alder Lake-P PCH SPI Controller

The pcieport 0000:00:1d.0 ASPM enable status:
        LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+

Pcieport is not in D3cold:          
0000:00:1d.0

The pcieport 0000:00:1c.0 ASPM enable status:
        LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk-

Pcieport is not in D3cold:          
0000:00:1c.0

Pcieport is not in D3cold:     
0000:00:1c.7

Available bridge device: 0000:00:06.0 0000:00:07.0 0000:00:1c.0 0000:00:1c.7 0000:00:1d.0

I'm not sure what


The PCIe bridge link power management state is:
0000:00:06.0 Link is in L0

The link power management state of PCIe bridge: 0000:00:06.0 is not expected. 
which is expected to be L1.1 or L1.2, or user would run this script again.

The L1SubCap of the failed 0000:00:06.0 is:
        L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+

The L1SubCtl1 of the failed 0000:00:06.0 is:
        L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-

is about, although I did notice that the way intel formatted it makes it look like its one of the SSDs, checking lspci -vv it shows 00:06.0 as

00:06.0 PCI bridge: Intel Corporation 12th Gen Core Processor PCI Express x4 Controller #0 (rev 02) (prog-if 00 [Normal decode])
    Subsystem: CLEVO/KAPOK Computer Device 7716
    Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR+ FastB2B- DisINTx+
    Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
    Latency: 0, Cache Line Size: 64 bytes
    Interrupt: pin D routed to IRQ 122
    IOMMU group: 2
    Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
    I/O behind bridge: [disabled] [16-bit]
    Memory behind bridge: 80400000-804fffff [size=1M] [32-bit]
    Prefetchable memory behind bridge: [disabled] [64-bit]
    Secondary status: 66MHz- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- <SERR- <PERR-
    BridgeCtl: Parity- SERR+ NoISA- VGA- VGA16+ MAbort- >Reset- FastB2B-
        PriDiscTmr- SecDiscTmr- DiscTmrStat- DiscTmrSERREn-
    Capabilities: [40] Express (v2) Root Port (Slot+), MSI 00
        DevCap: MaxPayload 256 bytes, PhantFunc 0
            ExtTag- RBE+
        DevCtl: CorrErr- NonFatalErr- FatalErr- UnsupReq-
            RlxdOrd- ExtTag- PhantFunc- AuxPwr- NoSnoop-
            MaxPayload 256 bytes, MaxReadReq 128 bytes
        DevSta: CorrErr- NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend-
        LnkCap: Port #5, Speed 16GT/s, Width x4, ASPM L0s L1, Exit Latency L0s <4us, L1 <16us
            ClockPM- Surprise- LLActRep+ BwNot+ ASPMOptComp+
        LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+
            ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
        LnkSta: Speed 8GT/s, Width x4
            TrErr- Train- SlotClk+ DLActive+ BWMgmt+ ABWMgmt-
        SltCap: AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug- Surprise-
            Slot #0, PowerLimit 75W; Interlock- NoCompl+
        SltCtl: Enable: AttnBtn- PwrFlt- MRL- PresDet- CmdCplt- HPIrq- LinkChg-
            Control: AttnInd Unknown, PwrInd Unknown, Power- Interlock-
        SltSta: Status: AttnBtn- PowerFlt- MRL- CmdCplt- PresDet+ Interlock-
            Changed: MRL- PresDet+ LinkState+
        RootCap: CRSVisible-
        RootCtl: ErrCorrectable- ErrNon-Fatal- ErrFatal- PMEIntEna+ CRSVisible-
        RootSta: PME ReqID 0000, PMEStatus- PMEPending-
        DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR+
            10BitTagComp+ 10BitTagReq+ OBFF Not Supported, ExtFmt- EETLPPrefix-
            EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
            FRS- LN System CLS Not Supported, TPHComp- ExtTPHComp- ARIFwd+
            AtomicOpsCap: Routing+ 32bit+ 64bit+ 128bitCAS+
        DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR+ 10BitTagReq- OBFF Disabled, ARIFwd-
            AtomicOpsCtl: ReqEn+ EgressBlck+
        LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
        LnkCtl2: Target Link Speed: 8GT/s, EnterCompliance- SpeedDis-
            Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
            Compliance Preset/De-emphasis: -6dB de-emphasis, 0dB preshoot
        LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete+ EqualizationPhase1+
            EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
            Retimer- 2Retimers- CrosslinkRes: unsupported
    Capabilities: [80] MSI: Enable+ Count=1/1 Maskable- 64bit-
        Address: fee00218  Data: 0000
    Capabilities: [90] Subsystem: CLEVO/KAPOK Computer Device 7716
    Capabilities: [a0] Power Management version 3
        Flags: PMEClk- DSI- D1- D2- AuxCurrent=0mA PME(D0+,D1-,D2-,D3hot+,D3cold+)
        Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME-
    Capabilities: [100 v1] Advanced Error Reporting
        UESta:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UEMsk:  DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt+ RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
        UESvrt: DLP+ SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF+ MalfTLP+ ECRC- UnsupReq- ACSViol-
        CESta:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr-
        CEMsk:  RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
        AERCap: First Error Pointer: 00, ECRCGenCap- ECRCGenEn- ECRCChkCap- ECRCChkEn-
            MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
        HeaderLog: 00000000 00000000 00000000 00000000
        RootCmd: CERptEn- NFERptEn- FERptEn-
        RootSta: CERcvd- MultCERcvd- UERcvd- MultUERcvd-
            FirstFatal- NonFatalMsg- FatalMsg- IntMsg 0
        ErrorSrc: ERR_COR: 0000 ERR_FATAL/NONFATAL: 0000
    Capabilities: [220 v1] Access Control Services
        ACSCap: SrcValid+ TransBlk+ ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
        ACSCtl: SrcValid+ TransBlk- ReqRedir+ CmpltRedir+ UpstreamFwd+ EgressCtrl- DirectTrans-
    Capabilities: [200 v1] L1 PM Substates
        L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+
             PortCommonModeRestoreTime=110us PortTPowerOnTime=500us
        L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1- ASPM_L1.2+ ASPM_L1.1-
              T_CommonMode=110us LTR1.2_Threshold=616448ns
        L1SubCtl2: T_PwrOn=500us
    Capabilities: [150 v1] Precision Time Measurement
        PTMCap: Requester:- Responder:+ Root:+
        PTMClockGranularity: 4ns
        PTMControl: Enabled:+ RootSelected:+
        PTMEffectiveGranularity: Unknown
    Capabilities: [a30 v1] Secondary PCI Express
        LnkCtl3: LnkEquIntrruptEn- PerformEqu-
        LaneErrStat: 0
    Capabilities: [a90 v1] Data Link Feature <?>
    Capabilities: [a9c v1] Physical Layer 16.0 GT/s <?>
    Capabilities: [edc v1] Lane Margining at the Receiver <?>
    Kernel driver in use: pcieport
peterpeterp commented 5 months ago

I think I have the same issue on my Thinkpad T14 Gen4 with manjaro Did you fix it?

DrymarchonShaun commented 5 months ago

I think I have the same issue on my Thinkpad T14 Gen4 with manjaro Did you fix it?

I haven't checked to see if it's still an issue recently but as far as I know it hasn't been fixed.

danielstuart14 commented 3 months ago

Exact same issue on a Lenovo V14 (i5 12th Gen, kernel 6.10). For me this screams a Kernel bug, instead of a EC problem. s0ixSelftestTool also states that the NVME ssd / controller is the culprit, but on my case it is a Samsung PM9B1.