zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.94k stars 6.66k forks source link

Networking Power Management API #79384

Open broglep-work opened 1 month ago

broglep-work commented 1 month ago

Introduction

Based on the existing Device Power Management there is a need for an API to facilitate cooperative power management between subsystems and the application. As per documentation a networking subsystem may decide to keep a network interface powered on if it expects network activity in the near future and at the same time an application that streams data over a network may need to keep the network interface powered on continuously. Currently there seems to be no API that enables this kind of information flow from application towards subsystem without having to know details about the underlying devices behind the networking abstraction.

Problem description

For power constrained devices, apart from the CPU / MCU, the network interface devices (such as transceivers) consume considerable amount of power. There is thus a need to efficiently manage these resources to conserve power. While there is already good support for SOC and the Devices with System Power Management and Device Power Management there is still a missing link / API into the networking subsystem that facilitates the cooperative power management approach as outlined in Device Runtime Power Management

As a big part of protocols in networking subsystem such as DNS, NTP, CoAP, LwM2M are stateless on the networking layer (e.g. based on UDP), while still implementing stateful behavior such as request/response patterns on top, the networking subsystem itself is not able to know when to expect network activity in the near future. In order for the networking subsystem to efficiently manage the networking devices using the Device Runtime Power Management, it needs some supports / hints from higher level protocols / application.

Proposed change

Introduction of a kind of a hints / reservation system that acts as an API so that application & components are able to inform networking interface that it will be need in the near future. Possible solution could look like

int pm_network_expecting_response(const struct net_if *iface, int within_miliseconds) or

int pm_network_interface_reserve(const struct net_if *iface)
int pm_network_interface_release(const struct net_if *iface)

The actual network interface / driver implementation would have the final decision about the device power management, it will just take that information as an additional means to make the best decision to conserve power. E.g. if transceiver is able to still receive responses in a timely manner even when in low power modes, the network interface driver could put the transceiver into low power mode regardless. On the other hand in another case where a transceiver is not capable of receiving a response in low power mode at all, or with increased latency that would produce timeout on higher level application protocol, the network interface driver could prevent entering low power mode directly after transmit and remain in higher power mode until the next reception.

Dependencies

For efficient network interface / device power management, protocol implementation in Zephyr would ideally make use of this new interface to aid networking interface drivers to perform runtime device power management

Concerns and Unresolved Questions

There might be a need for similar power management APIs for coordination between application and subsystem other than the network subsystem. From my point of view, connectivity related subsystem has highest demand for such an API as connectivity is power consuming and there is a multitude of components in the networking subsystem. But one could envision a general purpose "Application Runtime Power Management" API (or similar) that facilitates cooperation between Device drivers and any subsystem and applications.

Alternatives

Use Device Runtime Power Management directly in application / protocol implementations

Instead of an additional API, it would be possible to use the existing APIs and use pm_device_runtime_put() and alike in the application and protocol implementations. This however kind of breaks abstraction between the different layers and application need to know about devices when it should not. While it would be possible for application itself (as it should know all the devices that are in play), for other more generic implementations like a CoAP / NTP / LwM2M client, it would not really work. E.g. A CoAP client implementation would like implement sending a confirmable message and wanting to inform the networking subsystem that it expects a response within certain amount of time (so that the transceiver device remains active to receive the confirmation in a timely manner). The CoAP client / application only knows about the network interface but does not really want / needs to care about the underlying transceiver device. Directly using Device Runtime Power Management in the CoAP client implementation would introduce tight coupling between implementation and the underlying device, something the network stack architecture tries to prevent.

Extend / Use Network Stack to Transport Auxiliary Information to facilitate Power Management

Information about the nature of a packet, such as it is a request and expects a response, could be attached to packets passed to the networking stack. The network interface drivers could use this information to infer how it should do power management of the underlying network device

jukkar commented 1 month ago

Network interface API has net_if_suspend/resume API (see include/zephyr/net/net_if.h), how does this proposal relate to that?

broglep-work commented 1 month ago

Network interface API has net_if_suspend/resume API (see include/zephyr/net/net_if.h), how does this proposal relate to that?

There is already some power management related code available, apart from net_if_suspend/resume there is also net_if_dormant_on/off. But those kind of APIs do not really fit well for the cooperative power management approach. It is rather an API where caller makes the final decision about power management of the interface and underlying devices, not the interface itself, so it best works when there is a single authority only.

Imagine having net_if_suspend/resume or net_if_dormant_on/off around each sending of UDP packet that is expecting a response in each client (NTP, DNS, CoAP, etc) kind of illustrates that those are not that well suited for this particular approach of power management. One issue is that application or different protocol client might not know (or should not need to know) how power consuming frequent suspend/resume dormant on/off cycles are. There was similar discussion around power management API in the realm of sensors that touched this topic as well.

My proposal would be to introduce API for cooperative power management that is also reference counted as the already existing power management apis, or an API that is time based so that also multiple clients can communicate when and how long they need the resource (interface/device), so that it lends itself for cooperative usage. The final decision about putting device in certain power state (and the network interface in the different suspend/dormant states) should then be up to the interface driver itself with the aid of the provided information from the application / protocol clients. Only at this lower levels there is detailed knowledge about the power characteristics of the devices in play and whether it makes sense to put device in certain low power states or not.

This concept could be generalized and could be extended to other subsystems as well (such as sensors), but for simplicity of discussion I took network power management as an initial proposal.

jukkar commented 1 month ago

What you are envisioning here sounds reasonable. Having all the bits and pieces figured out in the API requires probably many review rounds when you have something that can be reviewed. With this in mind, just send a draft PR to get more feedback.

broglep-work commented 1 month ago

Would be great to hear some opinions / insights also from people involved in the design of Power Management APIs before thinking about writing some code. Even though this issue here is targeting the networking subsystem, it imho kind of represents (when generalized) a missing piece in the overall Power Management API suite or is a consequence thereof.

I agree that it will probably require a few design & review rounds.

jukkar commented 1 month ago

Adding relevant people from maintainers file related to PM.

boaks commented 1 month ago

Just my 2 cents:

I currently use a Nordic nRF9160 based cellular device. For such cellular device supporting NB-IoT or LTE-M, several power saving functions are include, some are configured ahead, some depends on the current communication situation. PSM enables the modem to enter "sleeping" mode and wake up on the next message to send. That's usually configured ahead and is entered after a quiet period (timeout). eDRX will do something similar, the sleep isn't expected to be that long and deep. It's also triggered either by a quiet period timeout, or by the 3. function RAI. Release Assistance Indication (not the new Responsible AI) is then the only feature, which depends on the assistance of the application, but in fact better on both the application and the protocol layer (e.g. if the protocol layer is responsible for retransmissions or using multiple messages as blockwise transfer). For cellular modems there are two RAI modes, CP-RAI (Control Plain, GSMA Rel13, NB-IoT), that offers LAST MESSAGE (send and sleep) or ONE RESPONSE (send and wait for response before sleep). The second RAI mode, AS-RAI (Access Stratum, GSMA Rel 14, LTE-M) offers also LAST MESSAGE and "NO MORE DATA".

I'm not sure, if that cellular functions are intended to be controlled by that API proposed here. if so, I guess the Info LAST MESSAGE, ONE RESPONSE, NO DATA may be worth to be provided. One general issue may then be, that a device may queue messages to send, and these RAI info may be required to be attached to the messages or at least put in the "same queue".

Just to increase the motivation: With PSM and RAI, using CoAP/DTLS 1.2 CID a Thingy:91 runs for a year from a 1300mAh battery exchanging a message every hour. In the "wild", not only on paper ;-).