zephyrproject-rtos / zephyr

Primary Git Repository for the Zephyr Project. Zephyr is a new generation, scalable, optimized, secure RTOS for multiple hardware architectures.
https://docs.zephyrproject.org
Apache License 2.0
10.43k stars 6.39k forks source link

Bluetooth: Mesh: Modularizing the stack #27842

Open trond-snekvik opened 4 years ago

trond-snekvik commented 4 years ago

Introduction

Bluetooth Mesh is a complex component that consumes a significant amount of memory and CPU time. It is currently built as a semi-monolithic subsystem on top of HCI, and must be linked into the application binary, but some configurations could benefit from splitting the mesh stack into smaller components. In this RFC, I'd like to suggest some changes that would allow alternative build configurations without sacrificing the current user experience.

The Zephyr Bluetooth Mesh stack currently keeps most of its state in two places; net.c's bt_mesh structure for keys and book keeping, and the Config Server's bt_mesh_cfg_srv structure for feature management. These two objects are manipulated and passed around the stack pretty liberally. This removes the need for a lot of boilerplate accessor functions, but increases coupling to a point where separating the modules is virtually impossible.

Proposed change

I'd like to break up this monolith a bit, so that the mesh stack can be split into smaller components. My primary goal is to allow a split between the access and transport layers, so that the "core" stack can be compiled as a separate binary with a well defined, serializable interface. In the process, I'd like to reduce the overall internal coupling, so we can simplify testing of individual parts of the mesh stack and be more flexible for new configurations. To accomplish this, I want to propose a series of changes I think would benefit the overall architecture, as they strengthen the separation of concerns and removes unnecessary upward dependencies.

The biggest source of internal coupling is the Config Server's active part in manipulating bt_mesh. It enforces the specification defined rules for keys, features, addresses and states without actually owning this state. Additionally, the Config Server is currently acting as the owner of the feature configuration, the heartbeat state and the virtual address label UUIDs. Most of my proposed changes will revolve around changing the role of the Config Server from state owner to state interface, moving all state storage and spec mandated behavior inside the relevant modules.

In short, the changes are:

  1. Move the Heartbeat state to a separate module in the transport layer.
  2. Move feature configuration states into the core mesh stack, and deprecate bt_mesh_cfg_srv in favor of kconfig entries.
  3. Move virtual address labels to transport.
  4. Move key management into a separate module next to network and transport.
  5. Remove key-objects from access, and operate purely on app_idx/net_idx above transport.
  6. Replace all raw pointer access to bt_mesh from the Config Server, shell and access with public accessor APIs.

Detailed RFC

Reasoning and details for each of the listed changes:

1. Move the Heartbeat state to a heartbeat module #27908

Both in the specification and the Zephyr stack, the Heartbeat module sits in the transport layer. However, both the heartbeat state and its configuration behavior sits in the Config Server. I'd like to unite this behavior in a separate heartbeat module, with a public API for adding callbacks and checking the current state, as well as an internal API for configuration. The Config Server's hb_sub and hb_pub states would be deprecated.

2. Move the feature configuration states #27908

The Config Server owns a few states that the stack currently accesses through foundation.h:

Apart from the configuration stage, these states are only used by the core stack. Storing them in the Config Server creates an unnecessary upward dependency, that separates the core layers from its configuration, and forces applications to include a Config Server model, even when testing. I think these states should be moved into the modules that use them, making the Config Server an interface for the stack configuration, not the owner.

These states are currently stored in the user owned bt_mesh_cfg_srv structure, which provides an easy way of setting the default values and reading their current value. I think these mechanisms should be replaced by kconfig options for the states' initial value, and public accessor functions for reading the states from the application, so we can deprecate this structure. This removes any state duplication, and allows us to remove 20-50 lines of boilerplate in the application. It also prevents anyone from trying to change these states directly in the structure at runtime, which leads to unexpected behavior.

3. Move virtual address labels to transport #28062

This change follows the same logic as the feature configuration states, as it encapsulates the management of state inside the module that is actively using it. An internal API for adding and removing the virtual addresses should be added to transport.h.

4. Move key management to a separate module #28511

Whenever the Config Server deals with subnets and app_keys, it currently manipulates the bt_mesh structure directly, enforcing specification rules and coordinating various required function calls. I would like to move this behavior into the core mesh stack, again strengthening the idea that the Config Server is just an interface for the stack.

To prevent piling onto the already sizable network and transport modules, I propose moving all cryptographic key management into a separate key module. This makes the stack better prepared for future improvements to crypto management, such as trusted execution environments or dedicated crypto hardware, and separates the key management from the packet processing mechanisms of transport and network.

5. Only use key indexes in access #28511

Currently, the access layer must perform subnet and appkey lookup on outgoing messages. Some of these lookups are also repeated in the layers below, and I would argue that it's not actually access's place to determine the validity of the keys. To create a clear separation of concerns, the access layer can instead operate purely on key indexes, and rely on the transport and network layers for translating these into valid key objects. This removes all direct pointers to the bt_mesh structure from access, and isolates the key validity problem to the lower layers. On incoming messages, the access layer currently accepts a bt_mesh_net_rx structure, but only uses the bt_mesh_msg_ctx structure. This can also be simplified.

6. Replace any remaining bt_mesh pointers in access, models and shell

In addition to the aforementioned cases, the Config Server, access and shell modules access raw pointers to core memory directly in a couple of cases. These can all be replaced by simple getter APIs that only fetches the relevant states, instead of passing internal pointers.

Known cases:


In addition to these changes, the mesh settings storage can be decentralized, as suggested in #19850. It's not clear whether this is actually necessary for this set of changes, but it fits naturally.

Any input would be appreciated :)

LingaoM commented 4 years ago

18326

The friend node and low power node defined in upper transport layer, not in network layer, so split friend_cred_* from net.c to other place .

MarcianoPreciado commented 4 years ago

Some other suggestions during this API revision:

Mesh Beacon Dynamic enabling/disabling of mesh beacons is currently not canonically possible as the functions required are not accessible by an application. (headers defined in subsys/bluetooth/mesh). Not one of the following commands within my functions are accessible.

void enable_mesh_beacon(void)
{
    struct bt_mesh_cfg_srv *cfg_srv = bt_mesh_cfg_get();
    cfg_srv->beacon = BT_MESH_BEACON_ENABLED;
    bt_mesh_beacon_enable();
}

void disable_mesh_beacon(void)
{
    struct bt_mesh_cfg_srv *cfg_srv = bt_mesh_cfg_get();
    cfg_srv->beacon = BT_MESH_BEACON_DISABLED;
    bt_mesh_beacon_disable();
}

bool mesh_beacon_enabled(void)
{
    return bt_mesh_beacon_get() == BT_MESH_BEACON_ENABLED;
}

GATT Proxy The GATT proxy feature also cannot be easily dynamically activated or deactivated. It is basically defined by the Config to be on or off all the time. This is very limiting. The necessary functions are also obfuscated and loosely connected. The gatt proxy is a frequent advertiser and uses a lot of power. Would be nice to have a centralized/modular way to activated/deactivate

int enable_gatt_proxy(void)
{
#if CONFIG_BT_MESH_GATT_PROXY
    struct bt_mesh_cfg_srv *cfg_srv = bt_mesh_cfg_get();
    cfg_srv->gatt_proxy = BT_MESH_GATT_PROXY_ENABLED;
    int ret = bt_mesh_proxy_gatt_enable();
    if (ret)
        return ret;
    bt_mesh_proxy_adv_start();
#endif
    return 0;
}

int disable_gatt_proxy(void)
{
#if CONFIG_BT_MESH_GATT_PROXY
    struct bt_mesh_cfg_srv *cfg_srv = bt_mesh_cfg_get();
    cfg_srv->gatt_proxy = BT_MESH_GATT_PROXY_DISABLED;
    int ret = bt_mesh_proxy_gatt_disable();
    if (ret)
        return ret;
    bt_mesh_proxy_adv_stop();
#endif
    return 0;
}

bool gatt_proxy_enabled(void)
{
    return bt_mesh_gatt_proxy_get() == BT_MESH_GATT_PROXY_ENABLED;
}

RELAY, FRIEND, LPN I also believe there aren't enough functions available to the user for discovering if the device is a relay/friend/lpn or for dynamically setting active or inactive. One has to get the information from the configuration server, which is not accessible.

MarcianoPreciado commented 4 years ago

I'd also suggest having an option for LPNs to have to manually poll their friend nodes, rather than having the application do it in the background. This makes going into extreme low power states quite difficult. Overall, the lack of api to the LPN feature makes it very difficult to put into an application. The lack of comments also make it difficult to assess what the LPN is actually going to do once enabled.

trond-snekvik commented 4 years ago

@MarcianoPreciado Thanks for the feedback. Currently, these things are normally done through the config client model, even if you want to change your own configuration. I haven't made any specific moves towards changing this in #27908, but it's a good point. Essentially, if the config server becomes a front end for the configuration API, it should be possible to use the API directly as well. Could you perhaps take a look at that PR, and comment on the changes? For instance, should the new API in subsys/bluetooth/mesh/cfg.h be public?

The poll can actually be triggered with the bt_mesh_lpn_poll() function already, but there are not control points for the timed polling.

MarcianoPreciado commented 4 years ago

It's true that you can trigger your own poll. However, you cannot stop the application from polling on your behalf. There is also no control over how many times or how often to send friendship requests while in the enabled state. This is also a huge power waster. I'll take a look at the PR as far as cfg goes. Thanks for putting in the work to make the api better!

LingaoM commented 4 years ago

Non-connected for adv-mesh bearer and proxy connected-adv for proxy or pb-gatt are coupled in adv.c.

I've talked about refactoring the framework a long time ago. After all, it is an important thing for the overall performance of mesh.

Using a separate task to process, it still takes up a lot of ram and increases the throughput delay of message sending.

For this kind of problem, some people have feedback on slack.

LingaoM commented 3 years ago

Replay attack table is usually used in the transport layer to prevent replay attacks. However, the specification does not define it in a certain layer, but in a separate security paragraph. Therefore, move Replay attack is decoupled from the network layer by putting it into a separate module is better way.

zephyrbot commented 6 months ago

Hi @trond-snekvik, @jhedberg, @LingaoM,

This issue, marked as an RFC, was opened a while ago and did not get any traction. Please confirm the issue is correctly assigned and re-assign it otherwise.

Please take a moment to review if the issue is still relevant to the project. If it is, please provide feedback and direction on how to move forward. If it is not, has already been addressed, is a duplicate, or is no longer relevant, please close it with a short comment explaining the reason.

Thanks!