zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
11k stars 6.7k forks source link

Zephyr PM APIs confusion #75963

Open fabiobaltieri opened 4 months ago

fabiobaltieri commented 4 months ago

Introduction

Hi, I've been working with few chips using PM and PM_DEVICE recently and found some lack of clarity in the PM APIs and how/when they should be used.

Problem description

There's three main PM config options that control the system behavior, let's see if I got them right:

So if you are running on a system that is using PM and PM_DEVICE, the default setting is PM_DEVICE_SYSTEM_MANAGED=y which means that all of your devices implementing PM are going to be suspended and resume continuously as the CPU goes to sleep and resume. I guess this would be fine if the PM device API were meant to be used for that reason, like preparing drivers to wake up the CPU early if needed, but the problem is that this is not how they are used all the time.

If one does a quick grep for PM_DEVICE_DT_INST_DEFINE in drivers/ there's a lot of usage of this for devices external of the CPU, displays, flash memories, GNSS, LED controllers, sensors, let's ignore all the input ones that I did while (clearly) not understanding what these were for. All of these clearly don't want to be suspended every time the CPU sleeps for a few milliseconds, but that's what happens, it's the same API. You may argue it's API misuse, but then if the device callback were meant to be hard bound to the CPU going in and out of sleep, why would PM_DEVICE be decoupled from PM?

Now one can disable PM_DEVICE_SYSTEM_MANAGED, but now some of the drivers that do want that callback when the CPU suspends are not going to get it anymore, and there's no way out.

So one can use PM_DEVICE_RUNTIME instead and ensure to all the devices that are NOT bound to the CPU have runtime PM enabled and are controlled by the application, but then this disables PM_DEVICE_SYSTEM_MANAGED by default... why?!

Now, PM_DEVICE_RUNTIME, how's that supposed to be used? There's load of drivers calling pm_device_runtime_enable on themselves so that they use the runtime API automatically once enabled, but then that means that once a driver gets that call introduced it changes from being active by default to suspended by default and MUST be enabled manually by the application, which inevitably causes a regression downstream. So maybe it should not be used in the driver? And we should rely on zephyr,pm-device-runtime-auto or the application calling pm_device_runtime_enable on the device? It seems to make more sense, but then how does one handle pm_device_init_suspended (which is often paired with self pm_device_runtime_enable).

Proposed change

These APIs and use cases needs to be clear as a start, right now it seems like there's a bit of a mess in the code base, the PM callback are used with different scope.

The whole PM_DEVICE_SYSTEM_MANAGED story seems to be the immediate problem, at least it is to me right now. I think that this should not be a single option that enables/disables cpu-device hooks, maybe it should be a PM_DEVICE_FLAG_ bit? PM_DEVICE_FLAG_SYSTEM_MANAGED or something?

Looking for initial feedback on this, trying to confirm if there's something I'm missing.

fabiobaltieri commented 4 months ago

cc @ceolin @nashif @teburd @tmleman @JordanYates @gmarull @carlescufi @keith-zephyr @bjarki-andreasen

bjarki-andreasen commented 4 months ago

Since I discovered this function pm_device_driver_init, at least from a device driver perspective it is pretty straight forward, just include #include <zephyr/pm/device.h> and implement the pm_action callback implementing ON and RESUME, while hiding the SUSPEND and OFF actions behind #ifdef CONFIG_PM_DEVICE. Run pm_device_driver_init() as the last thing before leaving the drivers init function. No idea if this is documented anywhere, I just listened to a talk on YouTube :)

gmarull commented 4 months ago

A partial attempt to split PM_DEVICE from PM_DEVICE_RUNTIME was made a while ago https://github.com/zephyrproject-rtos/zephyr/pull/62160. PM_DEVICE should be ditched entirely, it's just nonsense. We should also consider introducing a "PM v2", and deprecate "PM v1".

fabiobaltieri commented 4 months ago

A partial attempt to split PM_DEVICE from PM_DEVICE_RUNTIME was made a while ago #62160. PM_DEVICE should be ditched entirely, it's just nonsense. We should also consider introducing a "PM v2", and deprecate "PM v1".

Interesting, one of the point I wrote and then dropped was asking whether DEVICE and DEVICE_RUNTIME really make sense in the way they are currently implemented. So what was the idea? PM_DEVICE for CPU bound and PM_DEVICE_RUNTIME for application controlled?

gmarull commented 4 months ago

A partial attempt to split PM_DEVICE from PM_DEVICE_RUNTIME was made a while ago #62160. PM_DEVICE should be ditched entirely, it's just nonsense. We should also consider introducing a "PM v2", and deprecate "PM v1".

Interesting, one of the point I wrote and then dropped was asking whether DEVICE and DEVICE_RUNTIME really make sense in the way they are currently implemented. So what was the idea? PM_DEVICE for CPU bound and PM_DEVICE_RUNTIME for application controlled?

PM_DEVICE -> suspend/resume devices when CPU goes to idle (ie, nonsense the way it's done) PM_DEVICE_RUNTIME -> suspend/resume devices based on consumer needs

They share code, but for no good reasons IMHO, it makes things confusing (and incompatible, how can the same callback be shared for both cases?)

bjarki-andreasen commented 4 months ago

PM_DEVICE from my understanding is just the resume/suspend hooks for devices? and PM_DEVICE_RUNTIME tracks usage, it is entirely "above" PM_DEVICE I believe. Maybe the source code paints a different story

ceolin commented 4 months ago

PM_DEVICE from my understanding is just the resume/suspend hooks for devices? and PM_DEVICE_RUNTIME tracks usage, it is entirely "above" PM_DEVICE I believe. Maybe the source code paints a different story

That is correct, PM_DEVICE is about enable the callback hook in devices. PM_DEVICE_SYSTEM_MANAGED and PM_DEVICE_RUNTIME control how device power management happens in devices.

ceolin commented 4 months ago

PM_DEVICE -> suspend/resume devices when CPU goes to idle (ie, nonsense the way it's done) PM_DEVICE_RUNTIME -> suspend/resume devices based on consumer needs

That is simply wrong. PM_DEVICE_SYSTEM_MANAGED is the option that suspend and resume devices. Also this behavior can be tuned, shallow power states can decide to not trigger power management action in devices using zephyr,pm-device-disabled

They share code, but for no good reasons IMHO, it makes things confusing (and incompatible, how can the same callback be shared for both cases?)

The callback is for responding to an action (suspend / resume / off ).. Why can't it be shared ?

ceolin commented 4 months ago

Introduction

Hi, I've been working with few chips using PM and PM_DEVICE recently and found some lack of clarity in the PM APIs and how/when they should be used.

Problem description

There's three main PM config options that control the system behavior, let's see if I got them right:

  • PM: enables CPU power states, basically a callback that says "hey CPU, I won't need you for a while, go to sleep"
  • PM_DEVICE: controls whether the device suspend/resume APIs are available and especially whether those xxx_pm_action driver callback are built or not
  • PM_DEVICE_RUNTIME: adds reference counters on the stuff above, with a different API, if the device enables the runtime features either with pm_device_runtime_enable or zephyr,pm-device-runtime-auto
  • PM_DEVICE_SYSTEM_MANAGED: enables the stuff in subsys/pm/device_system_managed.c, which basically means that now the system calls suspend and resume on all devices every time the CPU goes to sleep for too long, unless they are busy or flagged for pm runtime operation

So if you are running on a system that is using PM and PM_DEVICE, the default setting is PM_DEVICE_SYSTEM_MANAGED=y which means that all of your devices implementing PM are going to be suspended and resume continuously as the CPU goes to sleep and resume. I guess this would be fine if the PM device API were meant to be used for that reason, like preparing drivers to wake up the CPU early if needed, but the problem is that this is not how they are used all the time.

If one does a quick grep for PM_DEVICE_DT_INST_DEFINE in drivers/ there's a lot of usage of this for devices external of the CPU, displays, flash memories, GNSS, LED controllers, sensors, let's ignore all the input ones that I did while (clearly) not understanding what these were for. All of these clearly don't want to be suspended every time the CPU sleeps for a few milliseconds, but that's what happens, it's the same API. You may argue it's API misuse, but then if the device callback were meant to be hard bound to the CPU going in and out of sleep, why would PM_DEVICE be decoupled from PM?

Now one can disable PM_DEVICE_SYSTEM_MANAGED, but now some of the drivers that do want that callback when the CPU suspends are not going to get it anymore, and there's no way out.

So one can use PM_DEVICE_RUNTIME instead and ensure to all the devices that are NOT bound to the CPU have runtime PM enabled and are controlled by the application, but then this disables PM_DEVICE_SYSTEM_MANAGED by default... why?!

This option was introduced to get rid of the atrocity that was PM_DEVICE_RUNTIME_EXCLUSIVE that was enabled by default when PM_DEVICE_RUNTIME was enabled and to facilitate the migration was asked to keep the old behavior, so this option is disabled when PM_DEVICE_RUNTIME is enabled.

The idea is having direct option to select which behavior you want.

PM_DEVICE -> Enable the basic infra-struct, it is used by both behaviors.

This one we may remove or make a hidden option to avoid confusion and simplify things.

Now, PM_DEVICE_RUNTIME, how's that supposed to be used? There's load of drivers calling pm_device_runtime_enable on themselves so that they use the runtime API automatically once enabled, but then that means that once a driver gets that call introduced it changes from being active by default to suspended by default and MUST be enabled manually by the application, which inevitably causes a regression downstream. So maybe it should not be used in the driver? And we should rely on zephyr,pm-device-runtime-auto or the application calling pm_device_runtime_enable on the device? It seems to make more sense, but then how does one handle pm_device_init_suspended (which is often paired with self pm_device_runtime_enable).

Proposed change

These APIs and use cases needs to be clear as a start, right now it seems like there's a bit of a mess in the code base, the PM callback are used with different scope.

The whole PM_DEVICE_SYSTEM_MANAGED story seems to be the immediate problem, at least it is to me right now. I think that this should not be a single option that enables/disables cpu-device hooks, maybe it should be a PM_DEVICE_FLAG_ bit? PM_DEVICE_FLAG_SYSTEM_MANAGED or something?

Per device ? Per specific power states ?

Looking for initial feedback on this, trying to confirm if there's something I'm missing.

ceolin commented 4 months ago

A partial attempt to split PM_DEVICE from PM_DEVICE_RUNTIME was made a while ago #62160. PM_DEVICE should be ditched entirely, it's just nonsense. We should also consider introducing a "PM v2", and deprecate "PM v1".

There is nothing blocking the code to change. The problem was not ditching PM_DEVICE the problem was that you had tried to completely remove the support for suspend/resume devices when the system is idle (what is now controlled by PM_DEVICE_SYSTEM_MANAGED).

henrikbrixandersen commented 4 months ago

No matter how this all fits together, I think a good step would be to add more documentation on how the different options/behaviors are a) intended to be used by an application developer and b) intended to be implemented by the various driver types (on-chip/off-chip peripherals, SoCs).

ceolin commented 4 months ago

No matter how this all fits together, I think a good step would be to add more documentation on how the different options/behaviors are a) intended to be used by an application developer and b) intended to be implemented by the various driver types (on-chip/off-chip peripherals, SoCs).

Yes that is very reasonable.

gmarull commented 4 months ago

PM_DEVICE -> suspend/resume devices when CPU goes to idle (ie, nonsense the way it's done) PM_DEVICE_RUNTIME -> suspend/resume devices based on consumer needs

That is simply wrong. PM_DEVICE_SYSTEM_MANAGED is the option that suspend and resume devices. Also this behavior can be tuned, shallow power states can decide to not trigger power management action in devices using zephyr,pm-device-disabled

This continues to be a flawed design.

They share code, but for no good reasons IMHO, it makes things confusing (and incompatible, how can the same callback be shared for both cases?)

The callback is for responding to an action (suspend / resume / off ).. Why can't it be shared ?

Because they're entirely different scenarios?

... no matter what, the best proof that Device PM is not in good shape is that the majority of drivers do not implement it. I've repeatedly found people confused, without proper examples, guidance, etc.

JordanYates commented 4 months ago

Now, PM_DEVICE_RUNTIME, how's that supposed to be used? There's load of drivers calling pm_device_runtime_enable on themselves

That is an incorrect implementation, just because PM_DEVICE_RUNTIME is enabled, it doesn't mean the user wants it enabled on that particular device. Worse, there is no way to stop it from happening without modifying the driver.

And we should rely on zephyr,pm-device-runtime-auto or the application calling pm_device_runtime_enable on the device?

Yes

It seems to make more sense, but then how does one handle pm_device_init_suspended (which is often paired with self pm_device_runtime_enable).

As @bjarki-andreasen mentioned, this was the rationale behind pm_device_driver_init. It "magically" intialises the device into the lowest possible power state without the driver needing to think about it.

fabiobaltieri commented 4 months ago

That is an incorrect implementation, just because PM_DEVICE_RUNTIME is enabled, it doesn't mean the user wants it enabled on that particular device. Worse, there is no way to stop it from happening without modifying the driver.

Cool, that sounded wrong from the start, and unsurprisingly when I added that to the basic gpio input drivers I had users reaching out because their application lost the device all of a sudden.

So let's hypothetically say that I'm going drop those calls from all the input drivers, we end up with few use cases for PM_RUNTIME users:

But even under normal development, someone implements PM support for a device, applications running with PM and PM_DEVICE had everything working, now they update the rtos and all of a sudden a device starts working intermittently. Maybe they'll find PM_DEVICE_SYSTEM_MANAGED, manage to shut it off, but maybe their system relied on those calls for something else. Sucks to be you, only way to decouple the two is to enable PM runtime. Don't need usage counter? Maybe you have a suspend reason mask to control the device state (we do). Sucks even more to be you.

That is correct, PM_DEVICE is about enable the callback hook in devices. PM_DEVICE_SYSTEM_MANAGED and PM_DEVICE_RUNTIME control how device power management happens in devices.

Well that's not quite the whole story, the combination of PM, PM_DEVICE, PM_DEVICE_SYSTEM_MANAGED, how the driver is implemented and how the devices are configured in the tree control how power management happen. And some combination of those results in totally nonsense behavior. And that may change as the drivers evolve and depending on what the author believes the right way of using the API is.

Listen from what I see and read from these comments, with everyone suggesting different tricks to try and workaround the state of the API, I think it's pretty evident that the API is unclear. I think Gerard was right and the two use cases should be logically split. Don't care if they share the code or not, the implementation is not the point, but we need hooks for CPU devices (which should be ONE option, not a magic combination of three), we need an API for end devices, and the whole story should not have combinations that results in calls being done automatically by the system and random point, and especially should not be causing regression every time someone implements PM on a driver.

This should not be a thing were few people knows the tricks to make the thing work, the API needs to be defensive to misuse.

JordanYates commented 4 months ago

So let's hypothetically say that I'm going drop those calls from all the input drivers, we end up with few use cases for PM_RUNTIME users:

  • Nordic users, no PM, the device is now always active, does not take pm_runtime apis anymore, the users did not log and check the error codes, seemingly everything still works, user ends up with a very hard to troubleshoot power consumption regression
  • PM user, let's say stm32, exact same story, but now the device starts suspending and unsuspending with the CPU, user ends up with a device working intermittently AND a very hard to troubleshoot power consumption regression

Introducting minor power consumption regressions to fix incorrect implementations hasn't been a showstopper before AFAICT, why start worrying now :)

The STM32 behavior sounds like a real pain. And at the risk of sounding self-centered, its somewhat self-inflicted (not you personally, but PM in general). If PM has no way of determining which devices should be controlled when the CPU changes states, that sounds like a PM problem, nothing to do with PM_DEVICE_RUNTIME.

bjarki-andreasen commented 4 months ago

I believe PM_DEVICE and PM_DEVICE_RUNTIME in isolation are really useful. The application has full control of all devices, and devices sharing busses have a useful counter to keep the bus powered while any device is active. Power domains are also automatically managed to make sure all required domains are on for any resumed device.

Any "automatic" power management of devices which is not a direct result of pm_runtime_get() or put() from the application is a bug IMO.

All most users want is to be able to power a device before using it, and power it of when done. Only get() and put() should be of interrest to the application regarding device PM

fabiobaltieri commented 4 months ago

Introducting minor power consumption regressions to fix incorrect implementations hasn't been a showstopper before AFAICT, why start worrying now :)

I have absolutely no problem doing this, the problem is that if we clean this up we should:

  1. do it for the whole code base, get rid of every antipattern, otherwise we'll always get more, but more importantly
  2. introduce a regression and break the user ONCE, which hope may be the outcome of this conversation

If PM has no way of determining which devices should be controlled when the CPU changes states, that sounds like a PM problem, nothing to do with PM_DEVICE_RUNTIME.

Yes, and the problem is that now it relies on the runtime bit to work.

I believe PM_DEVICE and PM_DEVICE_RUNTIME in isolation are really useful.

Yes, I think that there should be only two options: one for CPU power management which enables everything you need: power states and cpu related power states handlers (why on earth you would want only half of that?!), and one for application controlled, I what we use runtime now. Except for people who uses the normal API because haven't been bitten by PM messing it up for them.

ceolin commented 4 months ago

Listen from what I see and read from these comments, with everyone suggesting different tricks to try and workaround the state of the API, I think it's pretty evident that the API is unclear. I think Gerard was right and the two use cases should be logically split. Don't care if they share the code or not, the implementation is not the point, but we need hooks for CPU devices (which should be ONE option, not a magic combination of three), we need an API for end devices, and the whole story should not have combinations that results in calls being done automatically by the system and random point, and especially should not be causing regression every time someone implements PM on a driver.

That is not the whole history, the problem was never split, the problem was completely removing device system managed power management while there are users depending on it. That is said, PM_DEVICE_SYSTEM_MANAGED was an attempt to clean up power management symbols and logic, that it seems it didn't fully achieve.

Additional APIs will have a cost and in the past there were complains. Btw, that is one of the reasons device power management managed by the system was not bound to system power management. Obviously time changes and can review it.

This should not be a thing were few people knows the tricks to make the thing work, the API needs to be defensive to misuse.

Absolutely ! If that is the case it is wrong.

Let me flush some ideas about how to split and simplify it.

ceolin commented 4 months ago

But even under normal development, someone implements PM support for a device, applications running with PM and PM_DEVICE had everything working, now they update the rtos and all of a sudden a device starts working intermittently.

Sorry what you mean here ? I didn't get the "update the rtos" part

Maybe they'll find PM_DEVICE_SYSTEM_MANAGED, manage to shut it off, but maybe their system relied on those calls for something else. Sucks to be you, only way to decouple the two is to enable PM runtime. Don't need usage counter? Maybe you have a suspend reason mask to control the device state (we do). Sucks even more to be you.

So, here what exactly is the need ? We have the following options:

These options are NOT mutually exclusive, it is possible to have both enabled, CONFIG_PM_DEVICE_SYSTEM_MANAGED and CONFIG_PM_DEVICE_RUNTIME.

From a device that implements power management, it can:

1) Rely on system to suspend / resume when it is idle 2) Enable device runtime PM and controls when it should be suspended / resumed

May a device that implements power management don't need any of these behaviors ? Calling directly pm_device_action_cb is risky and should not be used since it may be in use and lead to inconsistencies in device runtime or busy status for example.

Some initial thoughts based on this discussion:

1) Split the logic between device system managed and device runtime. Drivers to implement two actions callbacks:

They may be exactly the same in most cases but it gives the opportunity for wake up devices do something different without need to use notifiers. It also provides more flexibility for devices, since they can implement only one of these callbacks. For example, a driver can implement only the device runtime callback and it will not be suspend / resumed by the system even if it has not enabled device runtime pm.

2) Having different callbacks we can get rid of PM_DEVICE. Basically you enable CONFIG_PM and devices hooks are available.

3) The available options will be:

4) At start keep the same logic between system manged / runtime pm, if a device has runtime pm enabled the system will not call pm_device_runtime_action_cb_t

I am playing around with these ideas to see how effective they are. Early comments are welcomed :)

fabiobaltieri commented 4 months ago

@ceolin Thanks, yes the split logic you are describing resonates with what I had in mind.

typedef int (*)(const struct device *dev, enum pm_device_action action, const struct pm_state_info *state);

Great, wondering if this even needs action or could just get away with state, maybe renamed to next_state to make it clear, bit of an implementation detail really.

  1. Having different callbacks we can get rid of PM_DEVICE.

Really like the idea of removing some option, make it less likely to pick suboptimal configurations (why would you want only half power management...)

I would go as far as considering renaming the options to better describe the intent, maybe PM could be PM_SYSTEM and CONFIG_PM_DEVICE_RUNTIME, well it could be PM_DEVICE but that would be confusing as hell.

It also provides more flexibility for devices, since they can implement only one of these callbacks.

Yes that's what I'd like, the "update the rtos" bit I mentiond is, for example, if https://github.com/zephyrproject-rtos/zephyr/pull/75678 getes merged as is and PM users upgrades (ITE, NPCX, STM32...) they would either get the device autosuspending with the CPU, or the device starting as suspended. If we have split callbacks and get rid of pm_device_runtime_enable then that would not happen.

Actually at this point pm_device_runtime_enable does not make much sense, if a device has the callback, that's it. Maybe we can instead have only the devicetree property zephyr,pm-device-runtime-auto, which would be more like zephyr,pm-device-init-suspended. Now there's no risk of a driver being updated and changing behavior (if we actually drop that API), and one would only control the device PM initial status with devicetree. (@JordanYates what do you think?)

The available options will be:

  • CONFIG_PM_DEVICE_SYSTEM_MANAGED

Do we even need this at this point? If you have cpu states and system PM callbacks, is there a case where you woud NOT want them called? I think this should go and at this point it's safe to always have it enabled. Unless you are thinking about saving some code by dropping those loops.

Thanks again for going through this.

JordanYates commented 4 months ago

Maybe we can instead have only the devicetree property zephyr,pm-device-runtime-auto, which would be more like zephyr,pm-device-init-suspended. Now there's no risk of a driver being updated and changing behavior (if we actually drop that API), and one would only control the device PM initial status with devicetree.

Dropping pm_device_runtime_enable would be fine with me, I've never had a reason to use it because it implicitly locks the code to platforms that have that node. The proposed change to the devicetree property isn't great because if you're using pm_device_driver_init (you should be), the device can boot into either suspended or off, depending on the rest of the devicetree.

fabiobaltieri commented 4 months ago

@JordanYates I'm not familiar with the power domain stuff, but good point, that should be considered as well.

So what's the "correct" pattern in your experience? Unconditionally implement the pm_control callback with suspend/resume if PM_DEVICE and TURN_ON/OFF all the time and always initialize with pm_device_driver_init?

There's one driver in the tree using that right now and your name is all over it :-) I'd like if we can converge to a "right way" of using the PM API and then fix up the code base so all drivers adopt it.

JordanYates commented 4 months ago

@JordanYates I'm not familiar with the power domain stuff, but good point, that should be considered as well.

So what's the "correct" pattern in your experience? Unconditionally implement the pm_control callback with suspend/resume if PM_DEVICE and TURN_ON/OFF all the time and always initialize with pm_device_driver_init?

There's one driver in the tree using that right now and your name is all over it :-) I'd like if we can converge to a "right way" of using the PM API and then fix up the code base so all drivers adopt it.

Yes, that is the correct usage. Historically I was using limited upstream sensor drivers, so I wasn't pushing many changes, but that is changing now. I have a few more PR's pending that use the same pattern:

ceolin commented 4 months ago

There is something that be misleading without pm_device_runtime_enable. If a driver not initialize suspended or off (it is active). What is going to be its reference count ? if we start with 1 this will never get balanced, if we start with 0 the first time someone calls pm_device_runtime_get it will trigger the driver pm callback asking to resume. Not to say that is confusing having a device supporting runtime pm that is active but its reference count is 0.

Dropping this function would be ok if we assume that all devices that support runtime pm are using it and them initialize suspended.

bjarki-andreasen commented 4 months ago

@ceolin For devices which draw tons of power, say GNSS and cellular modems, The drivers I have written initialize these as suspended, as starting them is wasteful if they are not to be used immediately (which is often the case for trackers: power gnss, get fix, suspend gnss, power cellular, send fix). Aside from waste, starting multiple power hungry devices may overwhelm the power supply of a board, which may not be designed to run with multiple of these devices at the same time.

IMO, if PM_DEVICE is enabled, devices should be initialized suspended, and the application should be powering them as they become needed. So PM_DEVICE=y means app must call get() before using any device, and call put() when done. Only if PM_DEVICE=n should everything be blindy powered at boot :)

ceolin commented 4 months ago

Great, wondering if this even needs action or could just get away with state, maybe renamed to next_state to make it clear, bit of an implementation detail really.

That is not necessarily the next state, if the system is resuming the next is always ACTIVE it may be more useful to have know from which state it is resuming. But I am not entirely sure if that is needed.

  1. Having different callbacks we can get rid of PM_DEVICE.

Really like the idea of removing some option, make it less likely to pick suboptimal configurations (why would you want only half power management...)

I would go as far as considering renaming the options to better describe the intent, maybe PM could be PM_SYSTEM and CONFIG_PM_DEVICE_RUNTIME, well it could be PM_DEVICE but that would be confusing as hell.

It also provides more flexibility for devices, since they can implement only one of these callbacks.

Yes that's what I'd like, the "update the rtos" bit I mentiond is, for example, if #75678 getes merged as is and PM users upgrades (ITE, NPCX, STM32...) they would either get the device autosuspending with the CPU, or the device starting as suspended. If we have split callbacks and get rid of pm_device_runtime_enable then that would not happen.

Actually at this point pm_device_runtime_enable does not make much sense, if a device has the callback, that's it. Maybe we can instead have only the devicetree property zephyr,pm-device-runtime-auto, which would be more like zephyr,pm-device-init-suspended. Now there's no risk of a driver being updated and changing behavior (if we actually drop that API), and one would only control the device PM initial status with devicetree. (@JordanYates what do you think?)

The available options will be:

  • CONFIG_PM_DEVICE_SYSTEM_MANAGED

Do we even need this at this point? If you have cpu states and system PM callbacks, is there a case where you woud NOT want them called? I think this should go and at this point it's safe to always have it enabled. Unless you are thinking about saving some code by dropping those loops.

That is a valid question. This is a long time existent option in Zephyr I am not sure if there aren't use cases where an application uses only cpu states that don't cause execution context lost and because of this the application can just disable this behavior.

ceolin commented 4 months ago

@ceolin For devices which draw tons of power, say GNSS and cellular modems, The drivers I have written initialize these as suspended, as starting them is wasteful if they are not to be used immediately (which is often the case for trackers: power gnss, get fix, suspend gnss, power cellular, send fix). Aside from waste, starting multiple power hungry devices may overwhelm the power supply of a board, which may not be designed to run with multiple of these devices at the same time.

100% agree.

fabiobaltieri commented 4 months ago

For devices which draw tons of power, say GNSS and cellular modems, The drivers I have written initialize these as suspended, as starting them is wasteful if they are not to be used immediately

Sure, they may also put the device in a brownout reset loop on a weak battery, where you'd rather start with everything off, detect the condition and stop, blink some scary red led or something, that said, this behavior is giant hassle for samples, demos and prototyping.

I think it would be ideal to have the driver being flagged for init suspended in the devicetree, so the user has a choice. If the reworked PM APIs incorporated that for good it'd be ideal, at that point I don't see any use case for enabling that in runtime.

ceolin commented 4 months ago

I will be out on vacation next week but when I get back I will come up with a draft addressing the ideas discussed here.

fabiobaltieri commented 4 months ago

Thanks a lot Flavio, and everyone else for the inputs so far, just about to start the 4.0 cycle, could not have been better timing to figure this out.

albertofloyd commented 4 months ago

cc @scottwcpg

teburd commented 4 months ago

I’d note that perhaps with the updated sensor API power management could be done better by the driver itself. But this is something I need to explore

fabiobaltieri commented 4 months ago

@JordanYates (and everyone really) one more question: I've bumped info few drivers using PM runtime for their internal state, gpio/gpio_stm32.c, i2c/i2c_ll_stm32.c, i2c/i2c_nrfx_twim.c... there's quite a few.

Do you think that's a legitimate use of the API? It feels wrong to me, can't see why those driver needs the pm runtime stuff to control their internal state rather than just calling what they need to call directly, the internal state is then exposed with the application that can end up messing with it.

JordanYates commented 4 months ago

@JordanYates (and everyone really) one more question: I've bumped info few drivers using PM runtime for their internal state, gpio/gpio_stm32.c, i2c/i2c_ll_stm32.c, i2c/i2c_nrfx_twim.c... there's quite a few.

Do you think that's a legitimate use of the API? It feels wrong to me, can't see why those driver needs the pm runtime stuff to control their internal state rather than just calling what they need to call directly, the internal state is then exposed with the application that can end up messing with it.

This is a question that needs to be asked at the API level, not the driver level, otherwise users cannot write portable code. For example, I think we have general agreement that drivers implementing the flash API should be calling the PM functions internally, as in #73246.

Does the same behavior make sense for I2C? For the standard API maybe, but with the RTIO drivers I'm not sure there is a great location to call the PM functions automatically.

The question with "the drivers just calling what they need to call directly" is how do they handle a PM request in the middle of their internal transitions? Going through the PM API's means drivers don't need to worry about details like that.

teburd commented 4 months ago

@JordanYates (and everyone really) one more question: I've bumped info few drivers using PM runtime for their internal state, gpio/gpio_stm32.c, i2c/i2c_ll_stm32.c, i2c/i2c_nrfx_twim.c... there's quite a few. Do you think that's a legitimate use of the API? It feels wrong to me, can't see why those driver needs the pm runtime stuff to control their internal state rather than just calling what they need to call directly, the internal state is then exposed with the application that can end up messing with it.

This is a question that needs to be asked at the API level, not the driver level, otherwise users cannot write portable code. For example, I think we have general agreement that drivers implementing the flash API should be calling the PM functions internally, as in #73246.

Does the same behavior make sense for I2C? For the standard API maybe, but with the RTIO drivers I'm not sure there is a great location to call the PM functions automatically.

The question with "the drivers just calling what they need to call directly" is how do they handle a PM request in the middle of their internal transitions? Going through the PM API's means drivers don't need to worry about details like that.

With RTIO so long as there's work in the queue of I/O work to do, the driver could be on (get on first work item, put after last). Arguably more efficient even then the current implementations which would get/put after each i2c_transfer() equivalent transaction.

The same general setup could be done with sensors, so long as there's work to do the sensor is active. This is better in the case of the stream API where we know I/O work is pending on an interrupt (e.g. gpio interrupt signaling data ready). With polling (read + decode) at least we have a singular read request to get, read, put sort of setups vs the fetch + get which has no clear point to call put().

ceolin commented 4 months ago

@JordanYates (and everyone really) one more question: I've bumped info few drivers using PM runtime for their internal state, gpio/gpio_stm32.c, i2c/i2c_ll_stm32.c, i2c/i2c_nrfx_twim.c... there's quite a few. Do you think that's a legitimate use of the API? It feels wrong to me, can't see why those driver needs the pm runtime stuff to control their internal state rather than just calling what they need to call directly, the internal state is then exposed with the application that can end up messing with it.

This is a question that needs to be asked at the API level, not the driver level, otherwise users cannot write portable code. For example, I think we have general agreement that drivers implementing the flash API should be calling the PM functions internally, as in #73246.

Does the same behavior make sense for I2C? For the standard API maybe, but with the RTIO drivers I'm not sure there is a great location to call the PM functions automatically.

The question with "the drivers just calling what they need to call directly" is how do they handle a PM request in the middle of their internal transitions? Going through the PM API's means drivers don't need to worry about details like that.

There is exactly the point, specially with think on concurrent usage of a driver, e.g two different threads using it. The PM runtime API helps to keep the driver in a consistent state

ceolin commented 3 months ago

So, I am working through items listed here. Let me try to summarize the problems and possible changes:

Confusion / Complexity due many configuration (CONFIG_PM, CONFIG_PM_DEVICE, CONFIG_PM_DEVICE_SYSTEM_MANAGED,CONFIG_PM_DEVICE_RUNTIME)

The proposal is to have only two symbols, CONFIG_PM and CONFIG_PM_DEVICE_RUNTIME. These options can be used together or independently.

CONFIG_PM will cover system idle states and also what is covered now by CONFIG_PM_DEVICE and CONFIG_PM_DEVICE_SYSTEM_MANAGED.

Drivers don't have clear knowlodge if they are being asked to suspend/resume because the system is idle or due device runtime pm (pm_device_runtime_get/put)

Adding two more actions, PM_DEVICE_ACTION_RUNTIME_SUSPEND and PM_DEVICE_ACTION_RUNTIME_RESUME. These actions will be used only by device runtime pm (pm_device_runtime_get/put).

Most cases, if a driver support both methods (system managed device pm and device runtime pm) the driver will handle them the same way:

        switch (action) {
        case PM_DEVICE_ACTION_RESUME:
                __fallthrough;
        case PM_DEVICE_ACTION_RUNTIME_RESUME:
                ... 
                break;
        case PM_DEVICE_ACTION_SUSPEND:
                __fallthrough;
        case PM_DEVICE_ACTION_RUNTIME_SUSPEND:
                ...
                break;

But it allows, for the ones that need it, handle these actions differently or to support only one model.

Devices may need to take special actions depending on which power state the system is suspending. e.g if the SoC is transitioning to PM_STATE_SUSPEND_TO_RAM, a device that is used as wake up source may need to take a special action like changing pin control.

Changing the action callback to

   typedef int (*pm_device_action_cb_t)(const struct device *dev,
                     enum pm_device_action action,
                     const struct pm_state_info *soc_state);

where soc_state is the state that the system is going to when action == PM_DEVICE_ACTION_SUSPEND and it the state that the system is returning from when action == PM_DEVICE_ACTION_RESUME.

In the PM_STATE_SUSPEND_TO_RAM example, it would be something like:

   pm_device_system_action_run(dev, PM_DEVICE_ACTION_SUSPEND, PM_STATE_SUSPEND_TO_RAM);
   pm_device_system_action_run(dev, PM_DEVICE_ACTION_RESUME, PM_STATE_SUSPEND_TO_RAM);

Improper usage of pm_device_action_run

All runtime pm should be done using device runtime pm (pm_device_runtime_get/put), changing a device state manually may result in incosistencies.

We should remove or make it private. Any usage of this function out of the pm subsys must change to use device runtime pm.

@fabiobaltieri @JordanYates @bjarki-andreasen @teburd at all ^

ceolin commented 3 months ago

@mmahadevan108 @erwango ^

fabiobaltieri commented 3 months ago

@ceolin thanks for the proposal, I think it's a step forward splitting system and runtime, would go as far as suggesting renaming the current actions from PM_DEVICE_ACTION_RESUME to PM_DEVICE_ACTION_SYSTEM_RESUME (same for suspend).

Since this is going to require reworking the existing drivers, I'm wondering, would it be possible to remove the pm_device_runtime_enable API and only use zephyr,pm-device-runtime-auto?

This would remove the use case where a driver with no PM support works normally in an application with PM_RUNTIME enabled, then the driver implements the PM api and existing applications have a regression because now the driver initializes in suspend state.

(apologies for the delay in the reply, got caught into other things)

bjarki-andreasen commented 3 months ago

With the proposal to solely use pm_device_runtime_get/put, we can optimize the device drivers to just have two actions: resume and suspend (enabled/disabled).

A device can only do two things, it can be resumed, it can be suspended. That is all device drivers should implement. Why they are asked to be active or not is not relevant to them. Any further power modes or configuration is done through device driver APIs, like lowering the data rate of a sensor.

With runtime being used to control the devices, which is a necessity in a system which has busses and power domains (IE all of them), the actions PM_DEVICE_ACTION_TURN_OFF and PM_DEVICE_ACTION_TURN_ON no longer make sense since a domain can't just be put down. Power domains go off when all child devices are suspended, and can only be kept alive by an extra get().

We can shrink down PM_DEVICE (which covers current PM_DEVICE and PM_DEVICE_RUNTIME) into a single call in drivers and power domains:

int (*pm_action_callback)(const struct device *dev, bool enable);

with pm_device_runtime subsys managing get/put counts and the "devicetree" (device on power domain, power domain on other power domain etc.)

ceolin commented 3 months ago

With the proposal to solely use pm_device_runtime_get/put, we can optimize the device drivers to just have two actions: resume and suspend (enabled/disabled).

The proposal is for the devices that are actively doing power management only use device runtime pm and stop calling the action callback directly.

A device can only do two things, it can be resumed, it can be suspended. That is all device drivers should implement. Why they are asked to be active or not is not relevant to them. Any further power modes or configuration is done through device driver APIs, like lowering the data rate of a sensor.

With runtime being used to control the devices, which is a necessity in a system which has busses and power domains (IE all of them), the actions PM_DEVICE_ACTION_TURN_OFF and PM_DEVICE_ACTION_TURN_ON no longer make sense since a domain can't just be put down. Power domains go off when all child devices are suspended, and can only be kept alive by an extra get().

There is a difference between a device being suspended because it is not active and having the its power cut (because of the power domain it is was suspended). The thing is that you can have N devices under a power domain that is active because there is one device still active. When this device is suspended the domain is also suspended which means that the power supply for all N devices was cut, in this situation all N-1 devices that were already suspend may need to take additional action about it.

We can shrink down PM_DEVICE (which covers current PM_DEVICE and PM_DEVICE_RUNTIME) into a single call in drivers and power domains:

int (*pm_action_callback)(const struct device *dev, bool enable);

with pm_device_runtime subsys managing get/put counts and the "devicetree" (device on power domain, power domain on other power domain etc.)

ceolin commented 3 months ago

@ceolin thanks for the proposal, I think it's a step forward splitting system and runtime, would go as far as suggesting renaming the current actions from PM_DEVICE_ACTION_RESUME to PM_DEVICE_ACTION_SYSTEM_RESUME (same for suspend).

I can do it if it makes more clear. I had only introduced PM_DEVICE_ACTION_RUNTIME_* to distinguish from the system managed actions, but we can make it even more explicit.

Since this is going to require reworking the existing drivers, I'm wondering, would it be possible to remove the pm_device_runtime_enable API and only use zephyr,pm-device-runtime-auto?

This would remove the use case where a driver with no PM support works normally in an application with PM_RUNTIME enabled, then the driver implements the PM api and existing applications have a regression because now the driver initializes in suspend state.

Sounds good to me. Will take a look if there is any bigger implication doing it. @JordanYates ^

(apologies for the delay in the reply, got caught into other things)

No worries, thanks for coming back to this.

bjarki-andreasen commented 3 months ago

There is a difference between a device being suspended because it is not active and having the its power cut (because of the power domain it is was suspended). The thing is that you can have N devices under a power domain that is active because there is one device still active. When this device is suspended the domain is also suspended which means that the power supply for all N devices was cut, in this situation all N-1 devices that were already suspend may need to take additional action about it.

Right, with this in mind, we have 3 states, and 4 events (actions) that can be received according to this "state machine":

Is there anything missing from the above "state machine"? Maybe "skipping" directly from OFF to ACTIVE? (although this adds complexity to the drivers as they now need to implement OFF->INACTIVE and OFF->ACTIVE if it makes sense for them or not)

erwango commented 3 months ago

Right, with this in mind, we have 3 states, and 4 events (actions) that can be received according to this "state machine":

What about handling S2RAM state ?

bjarki-andreasen commented 3 months ago

Right, with this in mind, we have 3 states, and 4 events (actions) that can be received according to this "state machine":

What about handling S2RAM state ?

  • Device has been previously initialized either with boot time configuration (from ROM) or user configuration (saved in RAM) but registers are erased due to power loss which happened while device was either INACTIVE (ACTIVE also maybe ? Tbc)
  • Event is RESUME

S2RAM matches the INACTIVE state. From here it can be resumed when the system exits S2RAM. if power is lost, all state information is lost, so the device will implicitly be in OFF state and receive ON event at boot.

erwango commented 3 months ago

S2RAM matches the INACTIVE state. From here it can be resumed when the system exits S2RAM.

So you mean: up to the device to know that registers have to be reinitialized on RESUME ? I don't see it too easy without API, as registers loss can happen asynchronously. For instance:

bjarki-andreasen commented 3 months ago

Device goes from ACTIVE to INACTIVE for some reason.

Devices can't change states unprompted, they just follow whatever the application or some other higher level subsystem tells them.

  • Core lives its life and goes RUN>S2IDLE>RUN>S2RAM>RUN>S2IDLE>RUN (no information sent to device about these transitions > Maybe I'm wrong here ?)

The application or a higher level subsystem will be adjusting the device's state to adhere to whatever system state/power mode/power target the application or higher level subsystem uses.

RESUME is received by device (still in INACTIVE). How can it know it has to restore registers ?

This is not possible :) if at any time it lost power, it would have received the OFF event and have de initialized its registers, it will receive ON followed by RESUME if the application/higher level subsystem resumes it using get()

erwango commented 2 months ago

Devices can't change states unprompted, they just follow whatever the application or some other higher level subsystem tells them.

Sure. I meant "Device goes from ACTIVE to INACTIVE" ... "based on some external directive" (coming from application or some other higher level subsystem)

The application or a higher level subsystem will be adjusting the device's state to adhere to whatever system state/power mode/power target the application or higher level subsystem uses.

So, you mean that, in any case, device is informed of system state change ?

This is not possible :) if at any time it lost power, it would have received the OFF event and have de initialized its registers, it will receive ON followed by RESUME if the application/higher level subsystem resumes it using get()

To be exact here, device can lose power while remaining suspended. That's my understanding of S2RAM. Application will resume from where it was suspended and we don't expect it would reinit devices (with for instance uart run time configuration). Otherwise, why bothering to save and restore CPU context. See : The CPU context is usually the minimum set of CPU registers which content must be restored on resume to let the platform resume its execution from the point it left at the time of suspension. So device is expected to be able to restore its state (registers configuration) on its resume following a S2RAM operation on system side.

Note that ambiguity is allowed as we don't have much documentation on S2RAM feature: https://docs.zephyrproject.org/latest/search.html?q=S2RAM&check_keywords=yes&area=default

bjarki-andreasen commented 2 months ago

The application or a higher level subsystem will be adjusting the device's state to adhere to whatever system state/power mode/power target the application or higher level subsystem uses.

So, you mean that, in any case, device is informed of system state change ?

I define a system state as a high level state which dictates the power states of multiple devices, some being active, some being suspended, some being off. Any single device has no idea what any other device is doing (unless it on a bus in which case the device does keep the bus it is on resumed if needed).

With this definition, the device is not informed of the system state, only the state the system requires the device to be in.

This is not possible :) if at any time it lost power, it would have received the OFF event and have de initialized its registers, it will receive ON followed by RESUME if the application/higher level subsystem resumes it using get()

To be exact here, device can lose power while remaining suspended. That's my understanding of S2RAM.

I understand suspended to mean the device preserves its configuration, its just not active. For a UART peripheral, this means disabling the clock to it, while maintaining power to its HW registers. Apply the clock again and it resumes as if nothing happened. Loosing power however, what I describe as OFF, means no clock, and no power to registers. The UART is effectively hard reset to boot values. Applying a clock to it as this point does nothing, it has to be reinitialized from ROM/RAM by the application.

S2RAM would disable the clock to the UART, or, it could store the last runtime configuration in persistent RAM, completely power down the UART, then reinit it with its last configuration read from persistent RAM once resumed.

How exactly the kernel/system continues after S2RAM, and where previous configurations/states are stored, is platform specific/ambiguous I think.