pop-os / system76-driver

System76 Driver for Pop!_OS
Other
112 stars 29 forks source link

Add s0ix log file to support log dumps #263

Closed leviport closed 1 year ago

leviport commented 1 year ago

This should be useful to the support team while debugging S0ix problems.

The file it's dumping is /sys/kernel/debug/pmc_core/slp_s0_residency_usec. The file starts with 0 in it from boot, and that number increases any time S0ix is successfully reached. The number itself won't be terribly useful (it's something like the number of CPU cycles that it was suspended for, so it might be something like 1808448212), but it being non-zero means that S0ix is working.

I'm imagining this being useful for the systems that we suspect are having trouble reaching S0ix at all. To use this information effectively, I'm envisioning these logs being generated after the customer has tried suspending and left the machine suspended (or as close as it will get) for about 10 to 15 seconds. Then the support team will be able to tell whether it actually got all the way to S0 residency.

This is just a rough idea, so I'm open to suggestions. I'm sure there are ways to generate a much more useful log file.

leviport commented 1 year ago

Also I believe it just won't generate the file at all on S3 systems, but that will have to be verified...

jacobgkau commented 1 year ago

If this is added here, we'll want to also add it to support-panel: https://github.com/pop-os/support-panel

Also I believe it just won't generate the file at all on S3 systems, but that will have to be verified...

On a serw12 that uses S3, the /sys/kernel/debug/pmc_core/ directory doesn't exist, so neither does this file. (The dump_path function should just skip it if the file doesn't exist; I can check once this builds.)

crawfxrd commented 1 year ago

On a serw12 that uses S3, the /sys/kernel/debug/pmc_core/ directory doesn't exist, so neither does this file.

Yep, pmc_core is an Intel-specific interface and serw12 is an AMD system.

debugfs could also not be mounted, so /sys/kernel/debug/ may also not exist.

This should probably just be changed in the kernel. It should just be adding/modifying a printk to the kernel

Carrying a kernel patch just for this seems like a worse solution than checking debugfs.


For end users, the power LED blinking should be enough to check for S0ix. Although this is just a bare minimum, it will be the most noticeable. The second most noticeable thing would be how much battery is used while in standby.

If we need more testing/info then we can use S0ixSelftestTool. (And I should start using it during development as well.)

And beyond that, if we can reproduce it, we have access to the Intel NDA tool somewhere.

leviport commented 1 year ago

I'm leaning towards this not being necessary then, but I want to let Support weigh in. I think the info in this thread will be more helpful than adding the extra log file.

thomas-zimmerman commented 1 year ago

Support is all for more useful information. It is very helpful to know that the system is reaching deep sleep states as customer complaints over battery life in suspend is a recurring request.

With this file only showing up on S0ix systems, are there similar files for systems that use S3 sleep?

thomas-zimmerman commented 1 year ago

https://github.com/intel/S0ixSelftestTool :: I think I tried to use this before and it was not building. IIRC, it depends on tools that are not packaged on Ubuntu and requires kernel headers/features that are not part of the debian packaging.

XV-02 commented 1 year ago

The issue with using the S0ixSelftestTool is that it cannot look backwards, nor can it track across a number of cycles, or determine whether an occasional - but sufficiently frequent - failure is occurring. At a certain level, all I want is to print residency values before suspend and after resume.

The biggest issue is that, in the wild, end users are putting countless times more hours on our systems than we can or do. With traditional sleep states, the logs that we have are diagnostically useful. But the fact that s0ix is somewhat removed from that framework and less directly addressed by any of the logs we currently generate means there is a category of behaviour we are not tracking. I don't think it is unreasonable that, as the technology evolves, our base line of what we're logging should as well. Given that Intel largely articulates that the residency value incrementing is the test for s0ix (at least in the documentation I've seen) tracking that one, defined data point in our logs just as we track reaching sleep states in non-s0ix systems appears logical to me.

crawfxrd commented 1 year ago

To clarify, I'm saying we should use S0ixSelftestTool for dev+test, not the user for gathering logs.

XV-02 commented 1 year ago

(My misunderstanding. I totally agree on using the tool for that!)

leviport commented 1 year ago

I think this can go away