Open nia-e opened 1 year ago
Thinking about it, we can also modify the API so as to require a reference to the Screen
on widget/object creation (and not get just the active screen) and then store a PhantomData<&'a T> where T: NativeObject
such that they will automatically be invalidated on drop.
This seems to be the case for any Widget created, unfortunately my understanding of Rust FFI isn't yet good enough to help with this :-(
LVGL is a C
language libray which uses either the global malloc/free
or a custom allocator. When an LVGL resource is created memory is
allocated on the heap and a constant mutable pointer is returned
(ignoring static styles for the purposes of this discussion).
Some LVGL resources hold pointers to other LVGL resource, e.g. a widget to the associated set of styles and containers to their children. In this case, the referenced memory must have a lifecycle equal to or greater than the containing resource.
When binding to LVGL as a rust FFI library, then the binding needs to provide appropriate support for the rust user to manage these resources. This includes creation, lifecycle and then destruction.
Errors to mitigate include:
LVGL has a number of different ways in which resources are used.
The most common usage pattern in an embedded systems context is to
allocate LVGL displays, callbacks, screens, styles and widgets early
in the startup sequence of the embedded application and never release
them. This is sufficient for many embedded system UI requirements. In
this scenario, styles may be statically allocated in read-only memory
to save ram. For resources which require dynamic allocation, the lower
level lvgl_sys::
API may be used.
Where this patten is usable, it does not require free-ing of certain
LVGL resources. The common exception is animations. Pagination, tab or
carousel type requirements may be satisfied by pre-allocated
Fragments
and the FragmentManager
.
In this usage pattern, which may be required either to low ram necessitating more fine grained management of ram resource, or in scenarios where the UI layout and components may not be known at firmware build time, then LVGL resources including screens may be dynamically allocated and deallocated during the runtime of the application.
This mode is a combination of the Global and Dynamic lifespans in which some resources exist for the entire lifetime of the application and some are created and destroyed dynamically.
In LVGL, a widget is (with one exception) always parented by another Widget. Parentage is a one-to-many relationship, and when the parent widget is deallocated then the children are also deallocated in a cascade delete pattern.
For styles, there is a many-to-many relationship as a Widget may reference many styles for different scenarios, and a style may be referenced by many widgets. Fonts also have a many-to-many relationship, but are usually stored in read-only memory.
Images may be either dynamically allocated or stored in ROM.
AFAIK, LVGL is not thread safe. All mutation of LVGL resources should be done in critical regions which lock the entire LVGL domain.
When binding an FFI resource to a rust library, it is common to wrap
each external pointer in a corresponding rust struct. In the
lv_binding_rust
current implementation, this is done via the
bindings
rust library and via a code generator which uses proc
macros to generate the rust structs and trait impls.
When a rust struct is associated with an FFI pointer, then the
lifecycle of the rust struct must be tied to the lifecycle of the FFI
pointer. Usually that means that the rust struct exposes a ::new()
function which calls the FFI library to create the pointer, and a
Drop::drop()
implementation which frees the FFI function. As such,
it is the rust struct lifetime which dominates the binding. When the
rust struct is initialized so is the FFI pointer and when the lifetime
of the rust struct expires due to it leaving scope, then the FFI
pointer resource is freed.
Rust programmers should expect to be aware of the lifetime management rules of rust, and the borrow checker enforces many of them.
In the case of LVGL, there are two additional considerations. The concept of LVGL widgets being managed in a single parent multi-child tree, and the many-to-many relationship between widgets and styles.
The current implementation of Obj
(the base class for a widget) is:
//! Native LVGL objects
//!
//! Objects are individual elements of a displayed surface, similar to widgets.
//! Specifically, an object can either be a widget or a screen. Screen objects
//! are special in that they do not have a parent object but do still implement
//! `NativeObject`.
use crate::lv_core::style::Style;
use crate::{Align, LvError, LvResult};
use core::fmt::{self, Debug};
use core::ptr;
/// Represents a native LVGL object. Note: This is a _trait_ not a struct! It does not _contain_ the pointer
pub trait NativeObject {
/// Provide common way to access to the underlying native object pointer.
fn raw(&self) -> LvResult<ptr::NonNull<lvgl_sys::lv_obj_t>>;
}
/// Generic LVGL object.
///
/// This is the parent object of all widget types. It stores the native LVGL raw pointer.
pub struct Obj {
// We use a raw pointer here because we do not control this memory address, it is controlled
// by LVGL's global state.
raw: *mut lvgl_sys::lv_obj_t,
}
impl Debug for Obj {
fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
f.debug_struct("NativeObject")
.field("raw", &"!! LVGL lv_obj_t ptr !!")
.finish()
}
}
// We need to manually impl methods on Obj since widget codegen is defined in
// terms of Obj
impl Obj {
pub fn create(parent: &mut impl NativeObject) -> LvResult<Self> {
unsafe {
let ptr = lvgl_sys::lv_obj_create(parent.raw()?.as_mut());
if ptr::NonNull::new(ptr).is_some() {
//(*ptr).user_data = Box::new(UserDataObj::empty()).into_raw() as *mut _;
Ok(Self { raw: ptr })
} else {
Err(LvError::InvalidReference)
}
}
}
pub fn new() -> crate::LvResult<Self> {
let mut parent = crate::display::DefaultDisplay::get_scr_act()?;
Self::create(&mut parent)
}
}
impl NativeObject for Obj {
fn raw(&self) -> LvResult<ptr::NonNull<lvgl_sys::lv_obj_t>> {
if let Some(non_null_ptr) = ptr::NonNull::new(self.raw) {
Ok(non_null_ptr)
} else {
Err(LvError::InvalidReference)
}
}
}
Note that:
Drop::drop()
. As such, no
structs of type Obj or their corresponding LVGL resources will ever
be de-allocated. For the Global lifespan use case, this is not an
issue. Any exceptions to the Global case may be handled using the
low level bindings. As a general purpose bindings library, however,
it is highly constraining for us to assume that use case.Drop::drop()
if the Obj
goes out of scope then the rust struct will be released but not
the memory referenced by the underlying C
FFI pointer, thus
resulting in a memory leak.NativeObject for Obj
implements our constraint on de-referencing
null pointers.&mut
in this code. This is due to LVGL
performing a mutate-in-place on resources in many function calls as
it is a C
style library, which assumes mutability by default (as
opposed to rust assuming immutability by default).new()
methods, one which assumes there is
a current Screen
in scope and one which takes the parent reference
explicitlyObj
is not Send
or Sync
by virtue of containing a
raw pointer which implies !Send
and !Sync
trait restrictions. In
the current implementation, an impl of Send
trait would have
unproven (and nearly impossible to prove) safety, precluding it as a
sane option in a general purpose library. Without Send
, however,
mutation is restricted to the thread which created the rust struct
and thus async
is unusable.In order to address the issues noted above, it is proposed that an
explict destructor be implemented for LVGL resources. Implementing
such a destructor requires some careful consideration of the issues of
parental ownership and many-to-many references. In the case of the
destructor being invoked then the resource on the C
side will be
released and the Obj will become a null stub. In the correct context,
this means that the Obj will then be removed by the rust runtime.
It is proposed that the destructor be implemented as follows:
Obj
structs on the heap, using a constructor/builder
method and Arc
pointers. Arc
pointers are reference counted and
may be used with Mutex
to ensure single thread access. A possible
alternative is a single global mutex which locks all objects in the
LVGL domain.'static
until explicit drop.lv_obj_del
for the FFI pointer, set the pointer in the
rust struct to null, and remove the Obj from the mapping table. If,
however, it does have children then recurse the tree of children and
delete them working backwards from the leaf nodes. Recursing the
tree of children should be done in the LVGL C
space and will
return lv_obj_t
pointers. The global lookup table shall be used to
retrieve the correponding rust struct. This pattern moves the
cascade delete from the C
domain to the rust domain, and ensures
that there will not be rust structs with dangling references to
memory in the C
domain. Similar strategies may be required for
other resources on the rust side which hold FFI pointers.#[no_std]
for embedded
systems and thus alloc
needs to be handled internally in the
bindings libary for that case.Label
text.Side effects of this proposal include:
Hey @cydergoth, thank you so much for the detailed writeup! It definitely echoes some concerns I've had also. I mostly agree that we need to tackle dropping somehow and it's been a big headscratcher for me; I considered something tangentially related in #107 (specifically, this) which vaguely correlates.
On Obj
, I have tried the approach of holding it in a Box<_>
instead of just the pointer, but we currently don't handle dropping children of dropped objects properly and it resulted in segfaults all over the place. However, I'm not entirely certain that wrapping everything in an Arc
is a great idea simply for performance reasons; this actually sort of used to be how we did things (though only for examples, so it was up to the user to decide how and what to do). Keeping everything behind fat pointers and dynamically handling memory has performance and memory overhead, and LVGL is supposed to run on baremetal with minimal hardware. Reference counting and mutexes are not particularly taxing, but costs add up. Keeping a global mutex is probably less memory-intensive but might degrade actual performance by a lot.
I do like the idea of keeping a global list of live objects, though we are burdened by the fact that - as you pointed out - we don't have an allocator by default except for LVGL's (which has access to very little memory). I'm unsure thus if it's possible to implement it for all cases (i.e. without alloc
).
I do think there's a happy compromise possible to allow dropping and also not tax resources or performance too much. Rust's lifetimes do allow us to automatically invalidate a lot of things; changing e.g. Obj
to hold raw: Box<_>, parent: PhantomData<&'a Obj>
which is recursively invalidated on parent deletion might get us most of the way towards dynamic dropping (and we can simply have users wrap objects they want to live globally as ManuallyDrop<Obj>
).
This doesn't address threading though, so I am curious what can be done there (current approach being "nothing, just don't touch LVGL-space from multiple threads"). There's also the issue of "what do we do if something manifests an object pointer out of thin air" which can occur (see the animation PR I recently merged, where via witchcraft I need to turn a *mut lv_obj_t
into a &mut Obj
where the original Obj
is then leaked - leaking is useless now, but I specifically did it that way to be resilient against the future possibility of implementing real dropping). Here a global pointer table could be useful, though maybe we could also just pretend any witchcraft-derived Obj
is 'static
or make it ManuallyDrop
and never drop it since outside of special-case exemptions it can "only" happen if it's secretly a duplicate of an existing object; but then we must somehow ensure said object isn't dropped before this ghost one is, so we need to make sure these ghost objects only exist in their own very short scope.
Honestly I'm not sure what to do, but I worry about the cost involved in dynamic checking of anything. Though if you want to, I would be curious to see benchmarks.
So the performance of rendering should be mostly on the C
side, which would be using all native pointers. The extra overheard of Arc should only come into play when you need to make a call via the rust domain, which should be mostly during UI creation, right? After that it would just be small updates, with LVGL doing all the Rasterizing and rendering.
The mutex would need to be locked during rendering, then any events would have to wait for rendering to complete, but it would stop one thread like an event thread trying to update the LVGL model whilst another thread is deleting that object 😀
If you take a look at my repo, you'll see it uses an unsafe impl Send and async to update the UI. I don't think I could do that without async, unless I move completely to a command model and have the render thread poll a command queue for mutation requests - which whilst possible seems a little clunky as async effectively does the poll
Send
was actually implemented for Obj
until a few days ago when I intentionally removed it cause I realised it could be dangerous; maybe creating a distinct LvThreaded<T> where T: Widget
and allowing users to work with those behind mutexes etc might work?
Do you think prototyping with Arc would be valuable to determine precisely what the tradeoffs are?
Sure. Though if there is any noticeable impact on memory use or speed, we will probably need this as an explicit opt-in
Yes, it definitely needs a feature gate, as it will need alloc and maybe mutex too.
On Mon, Jun 19, 2023, 12:54 PM Nia @.***> wrote:
Sure. Though if there is any noticeable impact on memory use or speed, we will probably need this as an explicit opt-in
— Reply to this email directly, view it on GitHub https://github.com/lvgl/lv_binding_rust/issues/111#issuecomment-1597549134, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPWAWVGQOVYMRTYAGPYC5TXMCG5BANCNFSM6AAAAAAXYFHNEY . You are receiving this because you were mentioned.Message ID: @.***>
Would you be interested in working on this with me? I can ask that you be added as a collaborator
I'm a little busy with my actual job rn, but I'm hoping to get a chance to look at the weekend
On Tue, Jun 20, 2023, 7:03 AM Nia @.***> wrote:
Would you be interested in working on this with me? I can ask that you be added as a collaborator
— Reply to this email directly, view it on GitHub https://github.com/lvgl/lv_binding_rust/issues/111#issuecomment-1598645105, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPWAWXUQJ55Y2OJ6TNKHWDXMGGSFANCNFSM6AAAAAAXYFHNEY . You are receiving this because you were mentioned.Message ID: @.***>
That's fair. This probably won't be getting merged "soon", but I'll open a dedicated issue for threading and we can discuss it there.
See #140. I'll get started on trying to figure things out, but it's probably going to be a while before it works. But in case you want to help out with code - @kisvegabor can you give @cydergoth write perms on branches? I can't manage access for people w/ my current perms.
@nia-e I've added Admin rights for you. Now you should be able to invite people and change permissions.
Thanks! :D
Currently we provide no way to delete a
Screen
after it's been instantiated. Having the user manually call a deletion function also feels... un-rusty and generally unwise as it would be easy to forget. Also, making the screen be deleted onDrop
carries the risk of e.g. a user callingget_scr_act()
multiple times and double-deleting a screen -> a double free -> undefined behaviour. I'm honestly kind of stumped as to what to do here; we can't store some kind ofis_active: bool
field either as every call toget_scr_act()
returns a differentScreen
and there's no way to dynamically check on drop if there are otherScreen
s pointing to the same underlying object.The other option is to have the
Display
store a copy of a pointer to the lastScreen
it returned and simply refuse to return multiple identicalScreen
s thus allowing us to delete on drop, but this does make things slightly more inconvenient (and can be trivially circumvented by changing theScreen
and then going back, unless we store a collection).Also, how do we prevent accessing child objects of a
Screen
(or any object) and thus a use-after-free post-deletion?