project-chip / rs-matter

Rust implementation of the Matter protocol. Status: Experimental
Apache License 2.0
303 stars 43 forks source link

In place initialization of the `Matter` object, the buffers and subscriptions #198

Open ivmarkov opened 3 weeks ago

ivmarkov commented 3 weeks ago

(UPDATED: The section "Why is this PR still a draft" with recent progress.)

What follows below is a "mini-RFC" of sorts, justifying why we need this change, how it is implemented, next steps and so on.

What is the problem?

Initialize the Matter object, as well as other large objects (like IM handler's PooledBuffers) by avoiding the possibility of stack blowups.

Status Quo

This problem is already solved actually!

The reason why we can initialize the Matter structure... (and other structures, but from now on I'll talk only about Matter as everything applicable to Matter is also valid for the other large non-Future structures we are having, like PooledBuffers and Subscriptions)... so the reason why we can initialize it without risking stack memory blow-ups is because it has a const constructor. In other words, Matter::new(...) can be called from a const context.

How does that help?

... what the above would do is that it will reserve a place for the Matter object in the data section of the executable, and the linker startup script will initialize the Matter structure (and other structures in the data section) - upon program startup - with its value. This is possible, because Matter::new is const and so is ConstStaticCell::new. Therefore, the object layout of ConstStaticCell<Matter> is generated at compile time, saved (in the form of a byte sequence) in the linker data elf section, and then upon program startup, the whole data section is copied from the elf (or from flash) into RAM by the linker startup using a simple memcpy (or a memcpy-like) routine.

let boxed_matter = unsafe { let boxed_matter: Box<MaybeUninit> = Box::new_uninit();

// This is optimized by the compiler to do a similar `memcpy` as to when the `data` section is initialized by the linker,
// because we are copying a `const` value into the `*mut Matter` raw pointer.
// Be careful to inline this code and not hide the `MATTER` const behind a function though, as this optimization might 
// not trigger then.
boxed_matter.as_mut_ptr().write(&MATTER);

boxed_matter.assume_init()

}

// boxed_matter is now initialized, without any stack blow-ups ...


### Why are we solving (again) an already solved problem then?

Using `const`-initializers comes with one disadvantage: **flash size**. Since the default value of the `Matter` object is _not_ all 0s (or uninitialized data) - and if you can trust me - it is very, very difficult to come up with a default `Matter` object state which is all 0s and/or `MaybeUninit` (*) - ultimately - the `Matter` object is placed in the `data` section instead of in `bss`, which means the initial layout of the object has to be burned in flash.

Depending on the size of all buffers, number of sessions, number of supported fabrics and so on, the total size of the Matter stack is anywhere between 30KB to 70KB.

Now, this is nothing for embedded Linux scenarios, and is probably not that much for Espressif MCUs which have 4MB flash. But it might be something for other MCUs which generally have less flash size.

(*) Coming up with "all-zeroes" default object state in Rust is very, very difficult. 

Stuff which is a useful default and which is "all-zeroes":
- `false` in `bool`
- 0 in `u8`, `u32`, `u16` etc
- `None` in `Option<NonZeroU8>`, `Option<NonZeroU16>`, etc

Stuff which is a useful default and which is NOT "all-zeroes":
- `""` in `&str` (`&str` is a fat pointer and is NOT modeled - like in C++ - with `(nullptr, 0)`, but with `(alignof(&str), 0)` i.e. it ends up having a binary value of `0x0000000100000000` (on 32bit archs)
- `b""` (empty slice) in `&[u8]` - ditto as for `&str`
- `None` for `Option<Foo>` where `Option<Foo>` is not special-cased by the compiler as in e.g. `Option<NonZeroU8>`. Especially with `Option<Foo>` where `Foo` is an enum, the compiler often plays trick where it models `None` as an extra value in the `Foo`'s enum _tag_, and this extra value in the tag is almost never 0

### _The_ More Important Reason why we are solving it again

Putting aside flash size concerns, we have an existing, unsolved case of stack blow-ups (and excessive stack usage at runtime):
_We are very often allocating large objects on-stack and then moving them in their final destination (which is usually a `heapless::Vec` or an array inside the `Matter` object)._

Cases where we do this:
- `Fabric` (1.7KB) - first allocated on-stack then moved in the vector by `vec.push(fabric)`
- `Session` (~ 1KB currently, can be reduced) - ditto
- `ACL` - ditto
- A lot of other places, like temporary arrays of ~ 1KB inside the Secure Channel implementation

But let's concentrate on `Fabric` and `Session` for now.
In order to avoid their temporary stack allocation (and subsequent move into the vec via `push`), we need to somehow implement a "placement-new" for `heapless::Vec`, and somehow initialize the `Fabric` or `Session` in-place, directly in the memory of the `heapless::Vec` instance. Something like:
```rust
vec.push_in_place(|slot: &mut MaybeUninit<Fabric>| {
    let slot_ptr = slot.as_mut_ptr();

    unsafe {
        addr_of_mut!((*slot_ptr).node_id).write(node_id); 
        addr_of_mut!((*slot_ptr).fabric_id).write(fabric_id); 
        addr_of_mut!((*slot_ptr).fabric_idx).write(fabric_idx); 
        // ... and so on
    }
});

Two problems with the above:

Can we do better?

Yes. With pinned-init.

( BTW: We'll put aside the pinned- aspect of that crate. All our objects are Unpin and fortunately, pinned-init supports unpinned objects just fine, contrary to its name - without any exposure of the user to the notion of pinning. )

So what does pinned-init do? Grossly oversimplifying, but it allows us to turn:

vec.push_in_place(|slot: &mut MaybeUninit<Fabric>| {
    let slot_ptr = slot.as_mut_ptr();

    unsafe {
        addr_of_mut!((*slot_ptr).node_id).write(node_id); 
        addr_of_mut!((*slot_ptr).node_id).write(fabric_id); 
        addr_of_mut!((*slot_ptr).node_id).write(fabric_idx); 
        // ... and so on
    }
});

into:

vec.push_in_place(Fabric::init());

where Fabric::init() is almost like Fabric::new() except with a slightly different signature and function body:

impl Fabric {
    fn init(node_id: u64, fabric_id: u64, fab_idx: NonZeroU8) -> impl Init<Self> {
        init!(Self {
            node_id,
            fabric_id,
            fab_idx,
            icac <- rs_matter::utils::vec::Vec::init(), // Notice the weird `<-` syntax here and how we compose our `init` with `Vec`s `init`!
        })
    }

The above syntax - thanks to the init! macro provided by pinned-init allows us to use safe, readable, composable syntax to in-place initialize structures recursively! Or rather and more correctly - to express an in-place initializer/constructor that does it. Almost like in C++!

The similarity with regular new() -> Self is on purpose of course.

Can't we apply the const fn trick to Fabric and Session?

Yes, we can. Assuming we implement const fn Fabric::new() -> Self as a parameter-less initial state of the Fabric object, we can try the following:

const INITIAL_FABRIC: Fabric = Fabric::new();

fabric_mgr.fabrics_vec.push(INITIAL_FABRIC)?;
let fabric = fabric_mgr.fabrics_vec.last_mut().unwrap();

// Now modify `fabric` members as we see fit
fabric.node_id = ...

BUT: The problem is fabric_mgr.fabrics_vec.push(INITIAL_FABRIC)?; Even though INITAL_FABRIC is a const, we have no guarantee, that the compiler will (a) inline the call to fabrics_vec.push(...) and (b) it will use a memcpy to initialize the new fabric at the end of the vector with the INITAL_FABRIC const. It might or might not do that, depending on the optimization settings.

In contrastpinned-init always initializes in place, even in opt=0.

pinned-init background

This crate (or rather, a copy of it) is used in the RfL (Rust for Linux) project, and is therefore merged in the Linux kernel mainline. So despite the low-star rating on GitHub, I think the approach of the crate is very solid and credible.

Changes delta

The changes suggested here are incremental, and most importantly - additive.

Not so ideal stuff:

Alternatives

Option 1: Do nothing

We can continue relying on const fn for the initialization of the Matter object (and pay in increased flash size usage). For Fabric and Session, we can either do the const trick described above (and rely on compiler opt settings to do their job), or we can still fork heapless::Vec and introduce push_in_place but without the convenience of pinned_init::init! we'll have to use a lot of unsafe to in-place initialize the members of the Vec (Fabric and Session and in future possinly others).

Option 2: &mut-borrow large heapless::Vec instances

(This is what I tried originally.)

We can make the big heapless::Vec instances used in Matter no longer owned by Matter, but rather - borrowed in Matter by &mut references.

I.e. FabricMgr would become FabricMgr<'v> because it would contain a &'v mut heapless::Vec<Fabric> rather than owning that vector (or array) as it is now. Consequently, Matter<'a> would become Matter<'a, 'v>, Exchange<'a> would become Exchange<'a, 'v> and so on down the line. Why the existing covariant 'a lifetime cannot be merged with the new 'v lifetime is explained below.

Advantages:

Disadvantages:

Why is this PR still a draft?

pinned-init is currently still utilizing core::alloc::AllocError which is not in Rust stable yet. (and for no_std this is the only nightly feature it needs).

It seems the author is open to changing the crate in a way where it no longer unconditionally depends on core::alloc::AllocError. This error type is anyway only used in the InPlaceInit trait, which is a decorator of Arc, Rc and Box (i.e. the alloc module) which we don't use/need, so perhaps we can just put - in pinned-init - the definition of InPlaceInit (and the usage of core::alloc::AllocError) behind the alloc feature flag?

UPDATE: Problem solved. The relevant PR addressing ^^^ got merged yesterday and we are now using it via the newly released pinned-init V0.0.8.

I also need to thorough-fully test the in-place initialization - ideally - on an MCU. To be happening next week.

ivmarkov commented 3 weeks ago

@kedars @andy31415 Not ready for merging yet, because the current version of pinned-init it depends on requires nightly. Hopefully we can lift this restriction from pinned-init soon because I still need to test the changes end-to-end. But if you can look into the "mini-RFC" in the PR description.

Change is completely additive and incremental, but still important to understand and form an opinion on.

ivmarkov commented 3 weeks ago

Not so important but attached two screenshots:

ivmarkov commented 5 days ago

@kedars @andreilitvin In case you are wondering why there is no activity on this PR:

The reason is - before marking it "ready for merge" I need to prototype the follow up PR, which would bear the fruits from this one (hopefully).

In a way, prototyping the subsequent PR would thus de-risk and prove the usefulness of the changes introduced by this one.

Aside from re-working how Fabrics and ACLs are initialized and built-up during commissioning (an exercise primarily aiming at memory savings - particularly stack memory) I'm also working on improving the TLV framework in two major aspects: