nia-e commented 1 year ago

Currently we provide no way to delete a Screen after it's been instantiated. Having the user manually call a deletion function also feels... un-rusty and generally unwise as it would be easy to forget. Also, making the screen be deleted on Drop carries the risk of e.g. a user calling get_scr_act() multiple times and double-deleting a screen -> a double free -> undefined behaviour. I'm honestly kind of stumped as to what to do here; we can't store some kind of is_active: bool field either as every call to get_scr_act() returns a different Screen and there's no way to dynamically check on drop if there are other Screens pointing to the same underlying object.

The other option is to have the Display store a copy of a pointer to the last Screen it returned and simply refuse to return multiple identical Screens thus allowing us to delete on drop, but this does make things slightly more inconvenient (and can be trivially circumvented by changing the Screen and then going back, unless we store a collection).

Also, how do we prevent accessing child objects of a Screen (or any object) and thus a use-after-free post-deletion?

nia-e commented 1 year ago

Thinking about it, we can also modify the API so as to require a reference to the Screen on widget/object creation (and not get just the active screen) and then store a PhantomData<&'a T> where T: NativeObject such that they will automatically be invalidated on drop.

cydergoth commented 1 year ago

This seems to be the case for any Widget created, unfortunately my understanding of Rust FFI isn't yet good enough to help with this :-(

cydergoth commented 1 year ago

Some thoughts on lifecycle management for LVGL resources in rust

The problem

LVGL is a C language libray which uses either the global malloc/free or a custom allocator. When an LVGL resource is created memory is allocated on the heap and a constant mutable pointer is returned (ignoring static styles for the purposes of this discussion).

Some LVGL resources hold pointers to other LVGL resource, e.g. a widget to the associated set of styles and containers to their children. In this case, the referenced memory must have a lifecycle equal to or greater than the containing resource.

When binding to LVGL as a rust FFI library, then the binding needs to provide appropriate support for the rust user to manage these resources. This includes creation, lifecycle and then destruction.

Errors to mitigate include:

Use before allocation. Pointers should be null or valid.
Dereference null pointer. Null pointers should not be usable in a context where they may be dereferenced
Use after free. A referenced resource must not be freed whilst existing references still point to it. References which point to a freed resource must be unusable (or null-ed) after free.
Memory leak. A referenced object should have a defined lifecycle and should not exceed that lifecycle
Multiple concurrent access from different threads

Usage patterns for resources in LVGL

LVGL has a number of different ways in which resources are used.

Global lifespan

The most common usage pattern in an embedded systems context is to allocate LVGL displays, callbacks, screens, styles and widgets early in the startup sequence of the embedded application and never release them. This is sufficient for many embedded system UI requirements. In this scenario, styles may be statically allocated in read-only memory to save ram. For resources which require dynamic allocation, the lower level lvgl_sys:: API may be used.

Where this patten is usable, it does not require free-ing of certain LVGL resources. The common exception is animations. Pagination, tab or carousel type requirements may be satisfied by pre-allocated Fragments and the FragmentManager.

Dynamic lifespan

In this usage pattern, which may be required either to low ram necessitating more fine grained management of ram resource, or in scenarios where the UI layout and components may not be known at firmware build time, then LVGL resources including screens may be dynamically allocated and deallocated during the runtime of the application.

Mixed lifespan

This mode is a combination of the Global and Dynamic lifespans in which some resources exist for the entire lifetime of the application and some are created and destroyed dynamically.

Resource ownership

In LVGL, a widget is (with one exception) always parented by another Widget. Parentage is a one-to-many relationship, and when the parent widget is deallocated then the children are also deallocated in a cascade delete pattern.

For styles, there is a many-to-many relationship as a Widget may reference many styles for different scenarios, and a style may be referenced by many widgets. Fonts also have a many-to-many relationship, but are usually stored in read-only memory.

Images may be either dynamically allocated or stored in ROM.

Thread safety

AFAIK, LVGL is not thread safe. All mutation of LVGL resources should be done in critical regions which lock the entire LVGL domain.

Potential approaches

When binding an FFI resource to a rust library, it is common to wrap each external pointer in a corresponding rust struct. In the lv_binding_rust current implementation, this is done via the bindings rust library and via a code generator which uses proc macros to generate the rust structs and trait impls.

When a rust struct is associated with an FFI pointer, then the lifecycle of the rust struct must be tied to the lifecycle of the FFI pointer. Usually that means that the rust struct exposes a ::new() function which calls the FFI library to create the pointer, and a Drop::drop() implementation which frees the FFI function. As such, it is the rust struct lifetime which dominates the binding. When the rust struct is initialized so is the FFI pointer and when the lifetime of the rust struct expires due to it leaving scope, then the FFI pointer resource is freed.

Rust programmers should expect to be aware of the lifetime management rules of rust, and the borrow checker enforces many of them.

In the case of LVGL, there are two additional considerations. The concept of LVGL widgets being managed in a single parent multi-child tree, and the many-to-many relationship between widgets and styles.

The current implementation of Obj (the base class for a widget) is:

//! Native LVGL objects
//!
//! Objects are individual elements of a displayed surface, similar to widgets.
//! Specifically, an object can either be a widget or a screen. Screen objects
//! are special in that they do not have a parent object but do still implement
//! `NativeObject`.

use crate::lv_core::style::Style;
use crate::{Align, LvError, LvResult};
use core::fmt::{self, Debug};
use core::ptr;

/// Represents a native LVGL object. Note: This is a _trait_ not a struct! It does not _contain_ the pointer
pub trait NativeObject {
    /// Provide common way to access to the underlying native object pointer.
    fn raw(&self) -> LvResult<ptr::NonNull<lvgl_sys::lv_obj_t>>;
}

/// Generic LVGL object.
///
/// This is the parent object of all widget types. It stores the native LVGL raw pointer.
pub struct Obj {
    // We use a raw pointer here because we do not control this memory address, it is controlled
    // by LVGL's global state.
    raw: *mut lvgl_sys::lv_obj_t,
}

impl Debug for Obj {
    fn fmt(&self, f: &mut fmt::Formatter) -> fmt::Result {
        f.debug_struct("NativeObject")
            .field("raw", &"!! LVGL lv_obj_t ptr !!")
            .finish()
    }
}

// We need to manually impl methods on Obj since widget codegen is defined in
// terms of Obj
impl Obj {
    pub fn create(parent: &mut impl NativeObject) -> LvResult<Self> {
        unsafe {
            let ptr = lvgl_sys::lv_obj_create(parent.raw()?.as_mut());
            if ptr::NonNull::new(ptr).is_some() {
                //(*ptr).user_data = Box::new(UserDataObj::empty()).into_raw() as *mut _;
                Ok(Self { raw: ptr })
            } else {
                Err(LvError::InvalidReference)
            }
        }
    }

    pub fn new() -> crate::LvResult<Self> {
        let mut parent = crate::display::DefaultDisplay::get_scr_act()?;
        Self::create(&mut parent)
    }
}

impl NativeObject for Obj {
    fn raw(&self) -> LvResult<ptr::NonNull<lvgl_sys::lv_obj_t>> {
        if let Some(non_null_ptr) = ptr::NonNull::new(self.raw) {
            Ok(non_null_ptr)
        } else {
            Err(LvError::InvalidReference)
        }
    }
}

Note that:

this implementation does not implement Drop::drop(). As such, no structs of type Obj or their corresponding LVGL resources will ever be de-allocated. For the Global lifespan use case, this is not an issue. Any exceptions to the Global case may be handled using the low level bindings. As a general purpose bindings library, however, it is highly constraining for us to assume that use case.
as a consequence of not implementing Drop::drop() if the Obj goes out of scope then the rust struct will be released but not the memory referenced by the underlying C FFI pointer, thus resulting in a memory leak.
NativeObject for Obj implements our constraint on de-referencing null pointers.
many references are &mut in this code. This is due to LVGL performing a mutate-in-place on resources in many function calls as it is a C style library, which assumes mutability by default (as opposed to rust assuming immutability by default).
Note that there are two new() methods, one which assumes there is a current Screen in scope and one which takes the parent reference explicitly
Note that Obj is not Send or Sync by virtue of containing a raw pointer which implies !Send and !Sync trait restrictions. In the current implementation, an impl of Send trait would have unproven (and nearly impossible to prove) safety, precluding it as a sane option in a general purpose library. Without Send, however, mutation is restricted to the thread which created the rust struct and thus async is unusable.

In order to address the issues noted above, it is proposed that an explict destructor be implemented for LVGL resources. Implementing such a destructor requires some careful consideration of the issues of parental ownership and many-to-many references. In the case of the destructor being invoked then the resource on the C side will be released and the Obj will become a null stub. In the correct context, this means that the Obj will then be removed by the rust runtime.

Implementing destructor for Obj

It is proposed that the destructor be implemented as follows:

Allocate all Obj structs on the heap, using a constructor/builder method and Arc pointers. Arc pointers are reference counted and may be used with Mutex to ensure single thread access. A possible alternative is a single global mutex which locks all objects in the LVGL domain.
Provide a global mutable mapping table of raw pointers to rust Obj structs using weak pointers. This requires 32 or 64 bits per raw pointer, and sizeof(Arc) for the corresponding rust struct Id. The raw pointer shall be the key. Note that this does not require the complexity of a hash table as a linear scan or binary search will be sufficient for the number of entries. Weak pointers prevent the pointer locking the Obj as "in use".
On creation of an LVGL widget, assign it a new unique Id and insert it into the table, along with the raw pointer value. Note that this promotes the widget lifecycle to be 'static until explicit drop.
On drop() of a widget, determine if it has children. If it does not, then call lv_obj_del for the FFI pointer, set the pointer in the rust struct to null, and remove the Obj from the mapping table. If, however, it does have children then recurse the tree of children and delete them working backwards from the leaf nodes. Recursing the tree of children should be done in the LVGL C space and will return lv_obj_t pointers. The global lookup table shall be used to retrieve the correponding rust struct. This pattern moves the cascade delete from the C domain to the rust domain, and ensures that there will not be rust structs with dangling references to memory in the C domain. Similar strategies may be required for other resources on the rust side which hold FFI pointers.
Styles must also be implemented using ARC pointers, however they are not owned by widgets as they may have a many-to-many relations with widgets. This also applies to Fonts, although fonts are usually in read-only memory and would have a no-op drop function. To ensure that styles exist for the duration of the referencing resources, we should either keep a copy of the style references in the Obj, which involves handling potentially multiple styles per widget and thus an array of Arc references, or in a global table.
Care must be taken of the need to support #[no_std] for embedded systems and thus alloc needs to be handled internally in the bindings libary for that case.
Other resources may include associated CStrings such as Label text.
These capabilities may be placed behind feature gates to disable them in contexts such as Global lifespan.

Side effects of this proposal include:

Increased memory consumption on the rust side for additional fields in Object
Increased indirection for ARC pointers
Additional complexity and memory with ARC+Mutexes, thus requiring the embedded HAL to support mutexes.
This would be a breaking change for existing code using the lv_binding_rust library.

nia-e commented 1 year ago

Hey @cydergoth, thank you so much for the detailed writeup! It definitely echoes some concerns I've had also. I mostly agree that we need to tackle dropping somehow and it's been a big headscratcher for me; I considered something tangentially related in #107 (specifically, this) which vaguely correlates.

On Obj, I have tried the approach of holding it in a Box<_> instead of just the pointer, but we currently don't handle dropping children of dropped objects properly and it resulted in segfaults all over the place. However, I'm not entirely certain that wrapping everything in an Arc is a great idea simply for performance reasons; this actually sort of used to be how we did things (though only for examples, so it was up to the user to decide how and what to do). Keeping everything behind fat pointers and dynamically handling memory has performance and memory overhead, and LVGL is supposed to run on baremetal with minimal hardware. Reference counting and mutexes are not particularly taxing, but costs add up. Keeping a global mutex is probably less memory-intensive but might degrade actual performance by a lot.

I do like the idea of keeping a global list of live objects, though we are burdened by the fact that - as you pointed out - we don't have an allocator by default except for LVGL's (which has access to very little memory). I'm unsure thus if it's possible to implement it for all cases (i.e. without alloc).

I do think there's a happy compromise possible to allow dropping and also not tax resources or performance too much. Rust's lifetimes do allow us to automatically invalidate a lot of things; changing e.g. Obj to hold raw: Box<_>, parent: PhantomData<&'a Obj> which is recursively invalidated on parent deletion might get us most of the way towards dynamic dropping (and we can simply have users wrap objects they want to live globally as ManuallyDrop<Obj>).

This doesn't address threading though, so I am curious what can be done there (current approach being "nothing, just don't touch LVGL-space from multiple threads"). There's also the issue of "what do we do if something manifests an object pointer out of thin air" which can occur (see the animation PR I recently merged, where via witchcraft I need to turn a *mut lv_obj_t into a &mut Obj where the original Obj is then leaked - leaking is useless now, but I specifically did it that way to be resilient against the future possibility of implementing real dropping). Here a global pointer table could be useful, though maybe we could also just pretend any witchcraft-derived Obj is 'static or make it ManuallyDrop and never drop it since outside of special-case exemptions it can "only" happen if it's secretly a duplicate of an existing object; but then we must somehow ensure said object isn't dropped before this ghost one is, so we need to make sure these ghost objects only exist in their own very short scope.

Honestly I'm not sure what to do, but I worry about the cost involved in dynamic checking of anything. Though if you want to, I would be curious to see benchmarks.

cydergoth commented 1 year ago

So the performance of rendering should be mostly on the C side, which would be using all native pointers. The extra overheard of Arc should only come into play when you need to make a call via the rust domain, which should be mostly during UI creation, right? After that it would just be small updates, with LVGL doing all the Rasterizing and rendering.

The mutex would need to be locked during rendering, then any events would have to wait for rendering to complete, but it would stop one thread like an event thread trying to update the LVGL model whilst another thread is deleting that object 😀

If you take a look at my repo, you'll see it uses an unsafe impl Send and async to update the UI. I don't think I could do that without async, unless I move completely to a command model and have the render thread poll a command queue for mutation requests - which whilst possible seems a little clunky as async effectively does the poll

nia-e commented 1 year ago

Send was actually implemented for Obj until a few days ago when I intentionally removed it cause I realised it could be dangerous; maybe creating a distinct LvThreaded<T> where T: Widget and allowing users to work with those behind mutexes etc might work?

cydergoth commented 1 year ago

Do you think prototyping with Arc would be valuable to determine precisely what the tradeoffs are?

nia-e commented 1 year ago

Sure. Though if there is any noticeable impact on memory use or speed, we will probably need this as an explicit opt-in

cydergoth commented 1 year ago

Yes, it definitely needs a feature gate, as it will need alloc and maybe mutex too.

On Mon, Jun 19, 2023, 12:54 PM Nia @.***> wrote:

Sure. Though if there is any noticeable impact on memory use or speed, we will probably need this as an explicit opt-in

— Reply to this email directly, view it on GitHub https://github.com/lvgl/lv_binding_rust/issues/111#issuecomment-1597549134, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPWAWVGQOVYMRTYAGPYC5TXMCG5BANCNFSM6AAAAAAXYFHNEY . You are receiving this because you were mentioned.Message ID: @.***>

nia-e commented 1 year ago

Would you be interested in working on this with me? I can ask that you be added as a collaborator

cydergoth commented 1 year ago

I'm a little busy with my actual job rn, but I'm hoping to get a chance to look at the weekend

On Tue, Jun 20, 2023, 7:03 AM Nia @.***> wrote:

Would you be interested in working on this with me? I can ask that you be added as a collaborator

— Reply to this email directly, view it on GitHub https://github.com/lvgl/lv_binding_rust/issues/111#issuecomment-1598645105, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPWAWXUQJ55Y2OJ6TNKHWDXMGGSFANCNFSM6AAAAAAXYFHNEY . You are receiving this because you were mentioned.Message ID: @.***>

nia-e commented 1 year ago

That's fair. This probably won't be getting merged "soon", but I'll open a dedicated issue for threading and we can discuss it there.

nia-e commented 1 year ago

See #140. I'll get started on trying to figure things out, but it's probably going to be a while before it works. But in case you want to help out with code - @kisvegabor can you give @cydergoth write perms on branches? I can't manage access for people w/ my current perms.

kisvegabor commented 1 year ago

@nia-e I've added Admin rights for you. Now you should be able to invite people and change permissions.

nia-e commented 1 year ago

Thanks! :D

lvgl / lv_binding_rust

Dropping `Screen`s and associated memory leaks #111

Some thoughts on lifecycle management for LVGL resources in rust

The problem

Usage patterns for resources in LVGL

Global lifespan

Dynamic lifespan

Mixed lifespan

Resource ownership

Thread safety

Potential approaches

Implementing destructor for Obj