Open ivmarkov opened 3 weeks ago
@kedars @andy31415 Not ready for merging yet, because the current version of because I still need to test the changes end-to-end. But if you can look into the "mini-RFC" in the PR description.pinned-init
it depends on requires nightly
. Hopefully we can lift this restriction from pinned-init
soon
Change is completely additive and incremental, but still important to understand and form an opinion on.
Not so important but attached two screenshots:
MatterStack
from rs-matter-stack
initialized using const fn
- you can see 50KB in the data
segment
MatterStack
initialized with the newly introduced init
infrastructure - same 50KB, but now in .bss
(as the data being initialized starts its life as MaybeUninit
and only when we in-place initialize it with .uninit().init_with(MatterStack::init(...))
it becomes real)! :)
@kedars @andreilitvin In case you are wondering why there is no activity on this PR:
The reason is - before marking it "ready for merge" I need to prototype the follow up PR, which would bear the fruits from this one (hopefully).
In a way, prototyping the subsequent PR would thus de-risk and prove the usefulness of the changes introduced by this one.
Aside from re-working how Fabrics and ACLs are initialized and built-up during commissioning (an exercise primarily aiming at memory savings - particularly stack memory) I'm also working on improving the TLV framework in two major aspects:
FromTLV
to support with - in addition to the existing from_tlv
method - also with a new init_from_tlv
which allows to in-place initialize large structures - ala C++. Which is what the current PR is all about. For example, this means we can in-place initialize the Fabric
object from its TLV serialized buffer without materializing it on-stack first, and then moving it into the Vec
of fabrics.TLVElement
(the representation of partially-parsed TLV data) - from 40 bytes (currently, on x64) - to 16 bytes (on x64). (Similar reduction ratio for 32 bit MCU archs.) This one is interesting actually. While it might seem that I'm trying to cut the lawn with nail clippers, this is only at first sight. The problem is, we are full with TLV-from structures (IM ReadReq
, SubscribeReq
, WriteReq
and quite a few in SC too - Cert
- to name the worst one) - which are already full with partially-parsed TLV data in the form of TLVElement
or TLVArray
. So for those (which are created on the stack and moved around quite a bit and oh well, this is unavoidable, as this is their primary use case!) we might see a reduction of 2x if we can reduce TLVElement
and TLVArray
sizes. Hence why this exercise is important. Cert
for example currently weights more than 400 bytes on x64. By removing the static Vec<T, N>
instances in it and then by reducing the TLVElement
sizes we can go down to something like 112 bytes on x64, which is quite an improvement.
(UPDATED: The section "Why is this PR still a draft" with recent progress.)
What follows below is a "mini-RFC" of sorts, justifying why we need this change, how it is implemented, next steps and so on.
What is the problem?
Initialize the
Matter
object, as well as other large objects (like IM handler'sPooledBuffers
) by avoiding the possibility of stack blowups.Status Quo
This problem is already solved actually!
The reason why we can initialize the
Matter
structure... (and other structures, but from now on I'll talk only aboutMatter
as everything applicable toMatter
is also valid for the other large non-Future
structures we are having, likePooledBuffers
andSubscriptions
)... so the reason why we can initialize it without risking stack memory blow-ups is because it has aconst
constructor. In other words,Matter::new(...)
can be called from aconst
context.How does that help?
static_cell
'sConstStaticCell
and simply do:... what the above would do is that it will reserve a place for the
Matter
object in thedata
section of the executable, and the linker startup script will initialize theMatter
structure (and other structures in thedata
section) - upon program startup - with its value. This is possible, becauseMatter::new
isconst
and so isConstStaticCell::new
. Therefore, the object layout ofConstStaticCell<Matter>
is generated at compile time, saved (in the form of a byte sequence) in the linkerdata
elf section, and then upon program startup, the wholedata
section is copied from the elf (or from flash) into RAM by the linker startup using a simplememcpy
(or amemcpy
-like) routine.let boxed_matter = unsafe { let boxed_matter: Box<MaybeUninit> = Box::new_uninit();
}
//
boxed_matter
is now initialized, without any stack blow-ups ...Two problems with the above:
heapless::Vec
, as the upstream one does not have (yet) apush_in_place
methodpush_in_place
requires a lot of care by the caller, as it is full of unsafe code. And very, very verbose!Can we do better?
Yes. With
pinned-init
.( BTW: We'll put aside the
pinned-
aspect of that crate. All our objects areUnpin
and fortunately,pinned-init
supports unpinned objects just fine, contrary to its name - without any exposure of the user to the notion of pinning. )So what does
pinned-init
do? Grossly oversimplifying, but it allows us to turn:into:
where
Fabric::init()
is almost likeFabric::new()
except with a slightly different signature and function body:The above syntax - thanks to the
init!
macro provided bypinned-init
allows us to use safe, readable, composable syntax to in-place initialize structures recursively! Or rather and more correctly - to express an in-place initializer/constructor that does it. Almost like in C++!The similarity with regular
new() -> Self
is on purpose of course.Can't we apply the
const fn
trick toFabric
andSession
?Yes, we can. Assuming we implement
const fn Fabric::new() -> Self
as a parameter-less initial state of theFabric
object, we can try the following:BUT: The problem is
fabric_mgr.fabrics_vec.push(INITIAL_FABRIC)?;
Even thoughINITAL_FABRIC
is aconst
, we have no guarantee, that the compiler will (a) inline the call tofabrics_vec.push(...)
and (b) it will use amemcpy
to initialize the new fabric at the end of the vector with theINITAL_FABRIC
const. It might or might not do that, depending on the optimization settings.In contrast
pinned-init
always initializes in place, even in opt=0.pinned-init
backgroundThis crate (or rather, a copy of it) is used in the RfL (Rust for Linux) project, and is therefore merged in the Linux kernel mainline. So despite the low-star rating on GitHub, I think the approach of the crate is very solid and credible.
Changes delta
The changes suggested here are incremental, and most importantly - additive.
const fn
constructors are preserved, so folks can continue usingMatter::new(...)
in aconst
contextMatter
(and all other structures recursively downwards) just get an extrainit
method next to theirnew
constructor functions, which is an almost verbatim copy of the existingnew
, yet with the large members of the concrete struct using the<-
syntax of thepinned_init::init!
macro which delegates to the placement-newinit
methods of the inner structuresNot so ideal stuff:
RefCell
from Rustcore
intors_matter::utils::cell::RefCell
as thecore
RefCell
does not have aninit
placement-new method. However this is temporary. We'll have to anyway get rid ofRefCell
in future in favor of using a real mutex, which is either no-op for single-threaded scenarios (the default), or a real blocking one. This future mutex or the beginnings of it is available under the newblmutex
module which is introduced by this PR as wellheapless::Vec
asrs_matter::utils::vec::Vec
and extend it to (a) have aninit
in-place constructor and (b) to have thepush_inplace
method discussed above. Maybe in future, if thepinned-init
crate gains traction in the Rust embedded echosystem (and Embassy in particular) we might be able to merge theVec
changes upstream inheapless
(and merge ourblmutex
changes upstream too). But I find both of these not such a big issue.Alternatives
Option 1: Do nothing
We can continue relying on
const fn
for the initialization of theMatter
object (and pay in increased flash size usage). ForFabric
andSession
, we can either do theconst
trick described above (and rely on compiler opt settings to do their job), or we can still forkheapless::Vec
and introducepush_in_place
but without the convenience ofpinned_init::init!
we'll have to use a lot ofunsafe
to in-place initialize the members of theVec
(Fabric
andSession
and in future possinly others).Option 2:
&mut
-borrow largeheapless::Vec
instances(This is what I tried originally.)
We can make the big
heapless::Vec
instances used inMatter
no longer owned byMatter
, but rather - borrowed inMatter
by&mut
references.I.e.
FabricMgr
would becomeFabricMgr<'v>
because it would contain a&'v mut heapless::Vec<Fabric>
rather than owning that vector (or array) as it is now. Consequently,Matter<'a>
would becomeMatter<'a, 'v>
,Exchange<'a>
would becomeExchange<'a, 'v>
and so on down the line. Why the existing covariant'a
lifetime cannot be merged with the new'v
lifetime is explained below.Advantages:
Matter
object "data", even if they initially contain... nothingDisadvantages:
Matter
becomes even less ergonomic: now the user has to separately inject another set of buffers besidesPooledBuffers
. I.e. this change increases complexity. With the alternative suggested here, we can in fact mergePooledBuffers
back intoMatter
at some point in future, which would reduce complexity'v
lifetime from above for the&'v mut heapless::Vec
external buffers we'll be using is invariant (as it is for a&mut
ref). So we can't merge it with the existing, nice covariant'a
lifetime the Matter object has. No matter what we do - I even tried with object pools and whatnot, but ultimately, these are either&mut
orRefCell<&Pool>
orSomeInteriorMutabilityConstructLikeMutex<&Pool>
non-mutable references, and because cells with interior mutability always result in invariant lifetimes, like&mut
, we always end up with'v
being invariant and thus unmergable with'a
.'a
or'v
to be'static
, but this way we sacrifice the flexibility ofMatter
on platform like Embedded Linux, where none of the problems discussed here are applicable (stacks on Linux are 2MB by default) and where the user might want to allocateMatter
andBasicInfoConfig
andDevAttDataFetcher
and its mDNS impl and even the buffers on-stack. That would be impossible with'static
.Why is this PR still a draft?
pinned-init
is currently still utilizingcore::alloc::AllocError
which is not in Rust stable yet. (and forno_std
this is the only nightly feature it needs).It seems the author is open to changing the crate in a way where it no longer unconditionally depends oncore::alloc::AllocError
. This error type is anyway only used in theInPlaceInit
trait, which is a decorator ofArc
,Rc
andBox
(i.e. thealloc
module) which we don't use/need, so perhaps we can just put - inpinned-init
- the definition ofInPlaceInit
(and the usage ofcore::alloc::AllocError
) behind thealloc
feature flag?UPDATE: Problem solved. The relevant PR addressing ^^^ got merged yesterday and we are now using it via the newly released
pinned-init
V0.0.8.I also need to thorough-fully test the in-place initialization - ideally - on an MCU. To be happening next week.