[LARGE] add the ability to parse matter idl files and start the ability to code generate based on them

andy31415 commented 8 months ago

The current data models in data_model are not very consistent. Looking to have a single code generation that makes it consistent.

For now I added:

a nom-based parser for matter IDL files
macro codegen that generates cluster ID and command IDs
integrated this one into on-off cluster. Expectation is that we add more and more things (like attributes and any other aditional macros/impls) and make all clusters consistent.

Integration is done via rs-matter-macros, however I wonder if the rs-matter-macros should be generally exported somewhere else to allow for unit testing (most code should be in a non-proc-macro2 module to allow tests).

IDL parser code coverage should be decent, however I have no experience in data model development for clusters, so the macro bits and integration is still lacking. I re-export some bits that I should probably not.

Feedback would be appreciated.

ivmarkov commented 8 months ago

The current data models in data_model are not very consistent. Looking to have a single code generation that makes it consistent.

If you can give a few examples of inconsistencies. Not taking offence with that statement at all - just trying to understand where you see those.

Also, for me, the IDL generation does not solve a consistency problem, but rather - the annoying problem that unless we have the IDL generation, users (or us) will have to write tons of boilerplate code (what I call the clusters' metadata definitions) for all the clusters which are defined in the Matter Cluster Library spec.

Are we on the same page here?

andy31415 commented 8 months ago

The current data models in data_model are not very consistent. Looking to have a single code generation that makes it consistent.

If you can give a few examples of inconsistencies. Not taking offence with that statement at all - just trying to understand where you see those.

Also, for me, the IDL generation does not solve a consistency problem, but rather - the annoying problem that unless we have the IDL generation, users (or us) will have to write tons of boilerplate code (what I call the clusters' metadata definitions) for all the clusters which are defined in the Matter Cluster Library spec.

Are we on the same page here?

Hi, my complaint about consistency was that when I was comparing cluster_media_playback.rs, cluster_on_off.rs and cluster_basic_information.rs the content was ever so slightly different so I could not find the one format to generate. I started from the top and immediately hit differences in attribute declarations like:

#[derive(FromPrimitive)]
pub enum Attributes {

#[derive(FromRepr, EnumDiscriminants)]
#[repr(u16)]
pub enum Attributes {

#[derive(Clone, Copy, Debug, FromRepr)]
#[repr(u16)]
pub enum Attributes {

Final goal is to not have humans type out repetitive code and more easily import data from the cpp repo or spec one. we also get some more consistent behaviour as a sideffect (and requirement). We can tweak that if we really need things to differ, however I imagine we do not.

ivmarkov commented 8 months ago

cluster_media_playback.rs was not written or reviewed by me. The differences between other clusters (if any) should be brought to a minimum, I agree. Perhaps once we have the initial IDL support merged in, and we add generation of attributes (which I think is the biggest pain point) as well as generation of all other stuff (structures/collections), then we can also touch a bit the existing code. Or re-base it on top of the auto-generated import_idl code that we can use ourselves.

ivmarkov commented 8 months ago

@kedars

My personal take on this PR is that - once @andreilitvin confirms that the set of changes we negotiated so far are ready - we should merge it.

The PR is unobtrusive to the existing code-base, and once merged, the IDL generator can be further improved, extended and - possibly - even utilized for our own rs-matter system clusters with following PRs.

I'm personally excited of the prospects it opens.

kedars commented 8 months ago

@ivmarkov I agree, I think we can merge this and iterate on the merged entity faster, it is a step in the right direction.

@andy31415 I like the idea of moving the existing stuff in rs-matter-macros to rs-matter-macros-impl, it gives us testability and faster ways to iterate as well.

I have some questions on the idl_import macro and the best way to utilise this:

The way typical Derive macros expose functionality is where you don't necessarily need to know the implementation details within, you just use them. The way we are envisioning the idl_import in this PR, it is expected to generate a fair amount of code for each cluster leaving the cluster specific implementations outside. As a developer that is writing, say, the Handler routine, you'd want to know what Commands does the definition expose, or the Attributes. All these details are masked by the macro (the way any macro should). I wonder if there are ways to expose these to the developers in a more intuitive way. This might be especially important for new developers.
A similar question is also about how easy it is for the ongoing evolution of the cluster implementation itself. As we introduced newer features, the data_model representation evolved to its current form. Having a macro will certainly make evolution faster as there is a single place that can change all the implementations. Possibly we can have a way to emit the generated code into a file. This way, the next time we have to make changes to the data model, we could hand-edit the generated copy till we are satisfied and then move that into the macro which actually gets committed.

Not suggesting we address these within this PR itself. We can go ahead and merge if @andreilitvin you are at a stable enough point.

ivmarkov commented 8 months ago

The way typical Derive macros expose functionality is where you don't necessarily need to know the implementation details within, you just use them. The way we are envisioning the idl_import in this PR, it is expected to generate a fair amount of code for each cluster leaving the cluster specific implementations outside. As a developer that is writing, say, the Handler routine, you'd want to know what Commands does the definition expose, or the Attributes. All these details are masked by the macro (the way any macro should). I wonder if there are ways to expose these to the developers in a more intuitive way. This might be especially important for new developers.

To some extent this might be alleviated if we expose "user-friendly" cluster-specific handler traits which are adaptable to the generic Handler (or - rather - AsyncHandler now that async-fn-in-trait is supported with stable Rust) trait the framework expects.

Consider this code that the macro might be generating:

pub trait OnOffHandler {
    // Attributes
    async fn get_on_off(&self) -> Result<bool, Error>;
    async fn get_global_scene_control(&self) -> Result<bool, Error>;

    // Commands
    async fn off(&self) -> Result<(), Error>;
    async fn on(&self) -> Result<(), Error>;
    async fn toggle(&self) -> Result<(), Error>;
    async fn off_with_effect(&self, off_with_effect_request: &OffWithEffectRequest<'_>) -> Result<(), Error>;
    async fn on_with_recall_global_scene(&self) -> Result<(), Error>;
    async fn on_with_timed_off(&self) -> Result<(), Error>;
}

// Adapts `OnOffHandler` to the generic `AsyncHandler` trait
pub struct AsyncHandlerImpl<T>(T);

impl<T> AsyncHanlderImpl<T> {
    pub const fn new(handler: T) -> Self {
        Self(handler)
    }
}

impl<T> AsyncHandler for AsyncHandlerImpl<T>
where 
    T: OnOffHandler,
{
    async fn read(&self, ...) {
        todo!() // Call the relevant get_XXX attribute method on T
    }

    async fn write(&self, ...) {
        todo!() // Call the relevant set_XXX attribute method on T
    }

    async fn invoke(&self, ...) {
        todo!() // Call the relevant command
    }
}

In a way, the OnOffHandler would self-document the shape of the commands and attributes the cluster expects.

Furthermore, we would still generate e.g. the on_off::Attributes enum as well as the on_off::Commands enum, as well as a bunch of other stuff, like the Cluster<'_> const itself. While those (and the OnOffHandler trait) won't be easily browseable by the user as a source code, they would still be around for IDE code completion.

Combined with auto-generated documentation (say, we can put a comment

// IDL: readonly attribute optional boolean globalSceneControl = 16384

on top of the auto-generated on_off::Attributes::GLOBAL_SCENE_CONTROL enum element), that might be good enough?

andy31415 commented 8 months ago

The way typical Derive macros expose functionality is where you don't necessarily need to know the implementation details within, you just use them. The way we are envisioning the idl_import in this PR, it is expected to generate a fair amount of code for each cluster leaving the cluster specific implementations outside. As a developer that is writing, say, the Handler routine, you'd want to know what Commands does the definition expose, or the Attributes. All these details are masked by the macro (the way any macro should). I wonder if there are ways to expose these to the developers in a more intuitive way. This might be especially important for new developers.

I could not find a good way to simply #derive things without effectively duplicating most (probably all) of the IDL contents. We can always move to that if we chose since the macros generate code, so we can extract it if we decide to not use the macros..

The generated code will be somewhat large: every structure (including even/request/response) will be generated and every attribute/command will have a separate item too. At first I imagine cargo-expand can be used to view what gets generated and unit tests also show generated code.

I think we have to decide where our source of truth is - this PR proposes .idl because those can be copied over from the CPP sdk and then it is guaranteed to match the sdk view of the spec. Later we could use data model XML which are based on spec text. We are not precluded from using the parser without macros and generating code and checking that in - I think if that becomes more approachable in the future we can go with that to.

A similar question is also about how easy it is for the ongoing evolution of the cluster implementation itself. As we introduced newer features, the data_model representation evolved to its current form. Having a macro will certainly make evolution faster as there is a single place that can change all the implementations. Possibly we can have a way to emit the generated code into a file. This way, the next time we have to make changes to the data model, we could hand-edit the generated copy till we are satisfied and then move that into the macro which actually gets committed.

I think we can extract what the macro would generate manually (cargo expand or similar), hand edit until we are happy and then put it back in the macro. I am unsure about the best flow, however iteration for now is reasonably friendly as the macro takes about 10ms.

Not suggesting we address these within this PR itself. We can go ahead and merge if @andreilitvin you are at a stable enough point.

I think it is as stable as we should for a first merge. It is a 8K .matter file and 1.5K test code, but the rest (about 2K lines) is still quite large and I would rather not place even more functionality on it.

kedars commented 8 months ago

I could not find a good way to simply #derive things without effectively duplicating most (probably all) of the IDL contents. We can always move to that if we chose since the macros generate code, so we can extract it if we decide to not use the macros..

To be clear, I wasn't proposing we go the #derive way, just highlighting the difference in consumption of the generated code.

Going ahead with the merge now.

project-chip / rs-matter

[LARGE] add the ability to parse matter idl files and start the ability to code generate based on them #120