lvgl / lv_binding_micropython

LVGL binding for MicroPython
MIT License
237 stars 156 forks source link

Formulate a concise list of Coding Convention requirements for a supported API #234

Closed amirgon closed 1 year ago

amirgon commented 1 year ago

Related: https://github.com/lvgl/lvgl/issues/3535#issuecomment-1232489218

A short and easy to follow list about MP conventions.
This will be communicated as LVGL coding standard. Less is better so we should have as few rules as possible

kisvegabor commented 1 year ago

We can add the points to check in the PR template: image

amirgon commented 1 year ago

Here are the things that this list should cover, ordered by importance.
Importance is based on how frequent these are missed, and how severe the problem is when they are missed.

Related: https://github.com/lvgl/lvgl/issues/1763

kisvegabor commented 1 year ago

Define gc roots where needed

In my experience people usually don't know what a GC root is. We should explain the gist briefly. E.g. Do notmalloc into a static or global variables. Instead declare the variable in LV_ITERATE_ROOTS list in lv_gc.h and mark the variable with GC_ROOT(variable) when it's used.

Follow callback conventions

How to summarize it in 1-2 sentences?

Follow function naming

How to summarize it in 1-2 sentences?

Use enums instead of macros

Prefer enums instead of macros. If inevitable to use defines export them with LV_EXPORT_CONST_INT(defined_value) right after the define.

How to define strings and arrays

In function arguments prefer some_type name[] declaration for array parameters instead of some_type * name

Use typed pointers instead of void pointers

Prefer typed pointers instead of void *.

embeddedt commented 1 year ago

Hmm, is it feasible to write a detector for missing GC roots (only for test purposes, not for regular applications)? It seems that the general idea would be to keep a list of all malloc allocations and verify that they are reachable from at least one of the GC roots. We have been bitten by that specific convention enough times in my opinion. :laughing:

kisvegabor commented 1 year ago

Any idea how to test it? :smile:

amirgon commented 1 year ago

The above was just a list. See my suggestion below for "Define gc roots where needed" bullet, comments are welcome!

is it feasible to write a detector for missing GC roots

It would be best to detect that with Static Analysis, where we don't run any code so we don't need test coverage.

What you suggest is Dynamic Analysis which requires running the code.
To run the code we would need an example. If we already have an example that runs the code (and also a Micropython example!) then we'll hit the problem anyway, either with Dynamic Analysis or without it.

Indeed it could be nicer to get a verbose error message about a missing gc-root instead of a segmentation fault, but we'll need to see if it's worth the effort.

btw, allocations don't have to be reachable only from gc roots - they can also be reachable from the stack, so stack would need to be scanned too. And these allocation chains are not limited to lv_malloc, and would usually go through Micropython allocated objects as well.

A different idea (possibly simpler?) is to find out what parts of the data segment is used for LVGL pointers (can probably be done by querying ELF debug symbols), then scan these pointers to see if any of them is pointing to Micropython heap (the area that is garbage-collected). Ideally this should be tested right before Micropython garbage collection, but we can also test this sooner.


Define gc Roots Where Needed

Background

When LVGL runs in Micropython, all dynamic memory allocations (lv_malloc) are handled by Micropython's memory manager which is garbage-collected (GC).
To prevent GC from collecting memory prematurely, all dynamic allocated RAM must be reachable by GC.
GC is aware of most allocations, except from pointers on the Data Segment:

Such pointers need to be defined in a special way to make them reachable by GC

Identify The Problem

Problem happens when either global, static global or static local pointer variable is allocated dynamically (with lv_malloc)

Solve The Problem

Example

https://github.com/lvgl/lvgl/commit/adced46eccfa0437f84aa51aedca4895cc3c679c

kisvegabor commented 1 year ago

Can we use the MP binding generator to somehow detect functions/patterns not following the convention? E.g. searching for specific text in lv_mpy.c like "error" if the the binding generation failed.

amirgon commented 1 year ago

Can we use the MP binding generator to somehow detect functions/patterns not following the convention? E.g. searching for specific text in lv_mpy.c like "error" if the the binding generation failed.

In a limited way - we can probably detect some cases.
Today when the binding script doesn't find user_data in a callback - it doesn't fail. It simply doesn't generate the binding for that callback. I could easily change the script to fail with an error in such cases.

For example, you can find this in the generated bindings ports/unix/build-dev/lvgl/lv_mpy.c:

/*
 * Function NOT generated:
 * Missing 'user_data' as a field of the first parameter of the callback function 'lv_anim_t_exec_cb_callback'
 * lv_anim_exec_xcb_t exec_cb
 */

However, if user_data appears as an argument but is not passed correctly from the callback registration function to the callbacks themselves - we will not detect that problem.

It's hard to detect other things.
The script can't know when it makes sense to use void * instead of Type * for example. The script can't guess if a function should be a member of a class or a struct when it's not named correctly, perhaps it is just a global function under lvgl module. etc.

kisvegabor commented 1 year ago

I think showing at least some potential user_data issues would be an important step. But I wonder how can we filter out the the known "Function NOT generated" reports.

amirgon commented 1 year ago

I think showing at least some potential user_data issues would be an important step. But I wonder how can we filter out the the known "Function NOT generated" reports.

I agree, but let's first formulate the Coding Convention list, and deal with this later.

I've added above an example for "Define gc Roots Where Needed". Could you comment on this?

embeddedt commented 1 year ago

@amirgon I think the example is great!

If we formulate a list of conventions in the docs then I think we just need to add a link in the current checklist.

kisvegabor commented 1 year ago

For GC roots I can imagine it like:

Do not malloc into a static or global variables. Instead declare the variable in LV_ITERATE_ROOTS list in lv_gc.h and mark the variable with GC_ROOT(variable) when it's used. Lear more [here](link to docs with the longer explanation).

I think GC will be good this way.

The 2 other important points are callback and naming conventions. I've grouped the related notes from https://github.com/lvgl/lvgl/issues/1763 . Is there anything else? If the list is ready we can simplify and shorten them.

amirgon commented 1 year ago

For GC roots I can imagine it like:

Do not malloc into a static or global variables. Instead declare the variable in LV_ITERATE_ROOTS list in lv_gc.h and mark the variable with GC_ROOT(variable) when it's used. Lear more [here](link to docs with the longer explanation).

I think GC will be good this way.

Sounds good.

The 2 other important points are callback and naming conventions. I've grouped the related notes from lvgl/lvgl#1763 . Is there anything else? If the list is ready we can simplify and shorten them.

The list covers it I think.
But something when wrong with the copy-paste from lvgl/lvgl#1763, for example lv__create should have been lv_<widgetName>_create, etc.

kisvegabor commented 1 year ago

I've tried to summarize the function name and callback conventions. What do you think?

amirgon commented 1 year ago
  • When registering callbacks in a struct, that struct should contain a void * user_data field. A pointer to the struct should be the first argument of the callback registration function. The callback should receive either a pointer to the struct as its first argument OR the same user_data which was used at registration as its last argument. Callbacks not following these conventions should contain the string xcb.

I think that callbacks, like gc, deserve some background and examples.
Most of the issues we had with Micropython bindings compatibility were related to callbacks. People just don't get the point of user_data and even when they add it they don't understand why it's needed and often use it the wrong way.
I also think it's better to clearly show the two options for callbacks, one with user_data as struct member and the other with user_data as function argument, and provide an example for each.
It's also important to point out that the same user_data that is provided in the callback registration must be passed over to the callback when it's called.

  • structs should be used via an API and not modified directly via their elements.

This is not accurate. The bindings knows how to handle struct members and allows the user to read and write them.

The goal here was to minimize the API and reduce the binding code size, by making structs private and exposing only their public members through getter/setter API functions.
But if a struct is fully public and the user is supposed to access all its members anyway, there is no point in adding an API to access it. In fact, this would be wasteful and make the API bigger for no reason.

kisvegabor commented 1 year ago

I think that callbacks, like gc, deserve some background and examples.

We can add link to more details but it's important have clear rules that we can easily check without any boilerplate.

I also think it's better to clearly show the two options for callbacks, one with user_data as struct member and the other with user_data as function argument, and provide an example for each. It's also important to point out that the same user_data that is provided in the callback registration must be passed over to the callback when it's called.

It's mentioned, but do you think it's not clear enough? _"The callback should receive either a pointer to the struct as its first argument OR the same userdata which was used at registration as its last argument."

structs should be used via an API and not modified directly via their elements.

This is not accurate. The bindings knows how to handle struct members and allows the user to read and write them.

It's a general LVGL rule too. The only place where we have used direct struct access was the drivers but I've already changed it in arch/drivers. I've also hidden the huge lv_disp_t and lv_indev_t types in lv_disp_priv.h and lv_indev_priv.h.

amirgon commented 1 year ago

It's mentioned, but do you think it's not clear enough? _"The callback should receive either a pointer to the struct as its first argument OR the same userdata which was used at registration as its last argument."

It's a long and a bit convoluted sentence... I think that two separate "bullets" that present each option might be clearer.

It's a general LVGL rule too. The only place where we have used direct struct access was the drivers but I've already changed it in arch/drivers. I've also hidden the huge lv_disp_t and lv_indev_t types in lv_disp_priv.h and lv_indev_priv.h.

Private structs and headers are excellent! They will reduce the binding code, which is already huge.

To make it work with Micropython bindings, just need to make sure that:

kisvegabor commented 1 year ago

It's a long and a bit convoluted sentence... I think that two separate "bullets" that present each option might be clearer.

What about this?

But it seems it's not accurate as lv_obj event callback are registered in lv_obj_t but the callbacks receive lv_event_t * e as their only argument. So it's neither a "a pointer to the struct as its first argument " nor _"the same user_data which was used at registration as the last argument"_ . Although the same user_data is passed to lv_event_t which was used during registration. So the callback convention part maybe should be:

If it's correct _"When callbacks can be registered for a struct that struct must have a void * user_data field."_ is not mandatory as the user_data can be stored anywhere anyhow. The key is that the user_data used during registration should be passed to the callback in struct or directly as an argument.

To make it work with Micropython bindings, just need to make sure that: [...]

Yep, it works like this.

amirgon commented 1 year ago
  • When callbacks can be registered for a struct that struct must have a void * user_data field.

I'm not sure it's clear what "callbacks can be registered for a struct" means.
Because callbacks don't necessarily relate to structs. Even when they do, the relation can differ from case to case.

  • In callback registration functions

    • the first argument should be a pointer to the struct for which the callback is registered
    • and the last parameter should be void * user_data.

This is not true. You don't need both first argument to be a pointer to the struct and the last parameter to be void * user_data.

You either need to pass the struct (which contains user_data), or pass user_data as last argument, regardless of struct.

But it seems it's not accurate as lv_obj event callback are registered in lv_obj_t but the callbacks receive lv_event_t * e as their only argument.

When events are registered in lv_obj_add_event_cb, the last argument is user_data.
When they are called, the same user_data is passed in lv_event_t.
This is possible only because user_data is copied correctly. This has special handling in LVGL on event_send_core, without that - it wouldn't work.

Callbacks should receive the same user_data which was used at registration either

  • in a struct as the callbacks first argument
  • or as the callback's last argument.

Correct but not complete, it doesn't say how user_data is passed during registration.

I think that to make it accessible and simple for the users, we can simply present the two options:

Or

It's true that we can mix these two options, as long as user_data is passed from registration to callback, but maybe we can omit that from the short-version of the rules for the sake of simplicity. On the longer explanation we can give the lv_event_t example for mixing the options.

Some short and simple examples could be very useful to clarify everything, I think.

The key is that the user_data used during registration should be passed to the callback in struct or directly as an argument.

Yes, I agree that this is the key. It's good to mention this, but it's not enough.
We must also explain how a "registration" should look like and how a "callback" should look like.
Because the binding script expects a specific "template" of registration and callback prototypes, where arguments must be at a certain position (struct must come first, user_data last), and user_data must be named exactly so, etc.

To make it work with Micropython bindings, just need to make sure that: [...]

Yep, it works like this.

Great!! :+1:

kisvegabor commented 1 year ago

Okay, so it seems it can be quite simple in the end. Together with the other points and conventions:

Be sure the following conventions are followed:

Regarding the examples: We have discussed that it would be great if contributors added C and MP examples too. However imagine that if LVGL had more bindings, e.g. C++, MP, Lua, JS, Rust, etc. We can't expect to add examples for all these. Instead we periodically (monthly?) open sponsored issues to updated the examples. We can have script which creates a list of missing examples. If the conventions are followed there can't be a big issue. What do you think?

amirgon commented 1 year ago
  • Question: Where? In their type or in their name?

In the callback typedef. For example:

typedef void (*lv_anim_exec_xcb_t)(void *, int32_t);

Worth mentioning that xcb callbacks would not be available in other languages. (at least not in Micropython, but probably also not on other auto-generated bindings that need to keep track of callback context without libffi)

However imagine that if LVGL had more bindings, e.g. C++, MP, Lua, JS, Rust, etc. We can't expect to add examples for all these. Instead we periodically (monthly?) open sponsored issues to updated the examples.

In reality, the only active binding we have today is Micropython, right?
Until we have bindings to multiple languages, how about asking/suggesting the contributor to provide Micropython example along with the C example? An example code is a very good way to verify that the conventions are really followed correctly.

kisvegabor commented 1 year ago

In the callback typedef. For example:

What if there is no dedicated type? E.g. void (*my_cb)(void)

In reality, the only active binding we have today is Micropython, right?

The whole picture looks like this:

So strictly speaking really MP is the only binding which has its own examples based on the C examples. However I hope we will have some more active binding soon. A Lua and a Rust binding would be great as well.

Until we have bindings to multiple languages, how about asking/suggesting the contributor to provide Micropython example along with the C example?

Okay, I'll promote adding MP examples until we won't have other bindings.

amirgon commented 1 year ago

What if there is no dedicated type? E.g. void (*my_cb)(void)

I think it could be a good idea to always require a typedef, for clarity.
But we could probably support void (*my_xcb)(void) as well without typedef, if needed.

Okay, I'll promote adding MP examples until we won't have other bindings.

Great!! 👍

kisvegabor commented 1 year ago

I think it could be a good idea to always require a typedef, for clarity.

Okay :slightly_smiling_face:

Please confirm if the list is good now. If so I'll update the PR template. (I added some missing points from the beginning of the conversation)

After that if you have interest and time the MicroPython docs could be extended with some longer descriptions about the conventions.

amirgon commented 1 year ago

Please confirm if the list is good now. If so I'll update the PR template. (I added some missing points from the beginning of the conversation)

Yes, the list is good :+1:
Please add it to the PR template.

After that if you have interest and time the MicroPython docs could be extended with some longer descriptions about the conventions.

Where should they go in the docs? Could you add a placeholder for it?

I would like to explain the most complicated topics in more details:

I can try to summarize these topics and add to the docs.
I think it would be useful to add links from the PR template to these topics on the docs.

kisvegabor commented 1 year ago

Please add it to the PR template.

Done: https://github.com/lvgl/lvgl/blob/master/.github/pull_request_template.md Looks a little bit long, but at least the content is clear and simple now. Let's see how it work in the practice and if needed we can restructure it to look friendlier.

Where should they go in the docs? Could you add a placeholder for it?

Just add a new section after "How can I use it?" like "Code conventions for the MicroPython binding"

I would like to explain the most complicated topics in more details:

:+1:

I think it would be useful to add links from the PR template to these topics on the docs.

Sure, once when it's ready I'll add the links.

amirgon commented 1 year ago

Sure, once when it's ready I'll add the links.

@kisvegabor did you add them?
I've looked at pull_request_template.md and can't find them. Maybe I should look someplace else?

kisvegabor commented 1 year ago

Thanks for the ping. I've just added it: https://github.com/lvgl/lvgl/commit/8dc7fc918a000826ad8a1be7d93da4a7bd60c4ba

amirgon commented 1 year ago

Thanks for the ping. I've just added it: lvgl/lvgl@8dc7fc9

Great! thanks.

Closing this as completed.

kisvegabor commented 1 year ago

I think it's a great step to avoid a lot of issues in the future.