Add primitive types to molecule for `syntax = 2`

eval-exec commented 1 year ago

This PR want to add uint{8,16,32,64},int{8,16,32,64} and bool as primitive types for syntax = 2.

This PR want to close #57 , and commit changes are based on syntax version feature #64

Implementation detail:

add a new crate: molecule-std, it's build.rs will generate a primitive_types.rs file from primitive_types.mol
molecule-std provide a with-primmitive-types feature (default is disabled ). When with-primitive-types feature is disabled, molecule-std act as same as molecule, because molecule-std has same module tree structure as molecule, when with-primitive-types feature is enabled, it will load primitive_types.rs, then user will import Uint64/Int32 from molecule-std

Todo:

[x] Implement codegen to expand primitive types when syntax = 2
[ ] Add tests

yangby-cryptape commented 1 year ago

I could not agree with your design.

First, there is a cyclic dependency between molecule and molecule-codegen:

https://github.com/nervosnetwork/molecule/blob/499b5970ab395532c8a7c7f402a5ef5455158c83/tools/codegen/Cargo.toml#L19

IMHO, you should
- add a new crate, e.g.: molecule-common (or molecule-std, molecule-ext), and put the new types into this new crate;
- re-export all things from the crate molecule to the new crate.
Second, generated code could be ignored.

I think no reviewers will check the generated code every time when review changes.

So, how to identify the generated code you commit are the code which generated with the molecule-codegen? Add a CI check?

IMHO, you could generate the code in compile time via build.rs.

eval-exec commented 1 year ago

@yangby-cryptape

molecule-codegen depend on molecule, but molecule doesn't depend on molecule-codegen, where is the cyclic?
Thank you, I would like to push another commit, try solve this with build.rs

yangby-cryptape commented 1 year ago

1. `molecule-codegen` depend on `molecule`, but `molecule` doesn't depend on `molecule-codegen`, where is the cyclic?

You put generated code into the molecule.

How to generate the code without molecule-codegen?

eval-exec commented 1 year ago

@yangby-cryptape

I need to add a field expand_primitive bool(default false) to Compiler.

when expand_primitive is true, generator will expand uint64 to Uint64 related stuff, this would be used by build.rs to generate primitive types.
when expand_primitive is false, generator will skip expand uint64, this would be used by binary executable moleculec and moleculec-rust.

eval-exec commented 1 year ago

@yangby-cryptape I have pushed a commit. Now I follow your suggestion and use build.rs to generate the code of primitive types.

eval-exec commented 1 year ago

The codegen crate depend on molecule, because it need molecule::{pack_number, Number, NUMBER_SIZE}.

I suggest copy molecule::{pack_number, Number, NUMBER_SIZE} to codegen, so the codegen crate will not depend on molecule anymore. Then we can put build.rs into molecule. Do you think it's a good idea? @yangby-cryptape

yangby-cryptape commented 1 year ago

Answer Your Question

The codegen crate depend on molecule, because it need molecule::{pack_number, Number, NUMBER_SIZE}.

If I remembered correctly, almost all generated code require molecule::{pack_number, Number, NUMBER_SIZE}.

I suggest copy molecule::{pack_number, Number, NUMBER_SIZE} to codegen,

Duplicated code are bad.

Not only me, almost all books I read have same opinion. They call them "bad smell".

I don't think putting all things into a single crate is reasonable.

I wonder why you resist creating a new crate?

In Further

In fact, even I don't think you have to update code in codegen or compiler; maybe a few, but almost none.

Write a schema file for primitive types.
molecule-codegen reads the schema file to load primitive types, reuses the import feature.

This also could be a feature, so users could choose to disable (or override) them.
molecule-std reads the schema file and calls codegen to create those primitive types, and re-exports the basic types and functions from molecule.
User could just use molecule-std directly.

IMHO, you create a schema file and then hard-code the types in codegen, these also are duplicated code: one thing, write twice.

eval-exec commented 1 year ago

I don't think putting all things into a single crate is reasonable. I wonder why you resist creating a new crate?

I don't resist creating a new crate. But I resist re-exports all the basic types from molecule to molecule-std, I want to put all generated primitive types to molecule, because the codegen/src/generator generate many statements which use molecule:: as return type and import statement prefix, like this: https://github.com/nervosnetwork/molecule/blob/e8727bdead01d2f3af67d6dbfea3d7ea914bbed8/tools/codegen/src/generator/languages/rust/builder/implementation.rs#L15-L27 . If I re-exports all types and functions from molecule to molecule-std, then I must change many occurence of molecule:: to molecule_std:: in generator.

molecule-codegen reads the schema file to load primitive types, reuses the import feature. This also could be a feature, so users could choose to disable (or override) them.

Where is the import feature? Is this what you mean "import feature"?

User could just use molecule-std directly.

User only use molecule-std and will not need molecule anymore? Because we will re-exports all types and functions from molecule to molecule-std.

IMHO, you create a schema file and then hard-code the types in codegen, these also are duplicated code: one thing, write twice.

I hard-code these: https://github.com/nervosnetwork/molecule/blob/6feff5737612aaf25d3fc0a145474d280807a8ba/tools/codegen/src/ast/verified/mod.rs#L160-L200 I haven't thought of a good way yet.

How about this: create a new crate molecule-number, move molecule::{pack_number, Number, NUMBER_SIZE} from crate molecule to crate molecule-number, then codegen only need molecule-number and don't need molecule anymore. Further, we can put build.rs in molecule. Do you this it's a good idea? @yangby-cryptape

yangby-cryptape commented 1 year ago

But I resist re-exports all the basic types from molecule to molecule-std

I had same opinion.

then I must change many occurence of molecule:: to molecule_std:: in generator.

It's very simple with sed, one command is enough.

But all programmers are lazy, so we could just do as the follow:

// Re-export
crate molecule { ... }
crate molecule_std {
    mod prelude {
        pub use molecule::*;
        pub trait A { ... }
        ... ...
    }
    pub use molecule::io;
    ... ...
}

Then, generate the following code in the top of every generated rust file:

#[cfg(feature = "with-primitive-types")]
use molecule_std as molecule;

It's better than sed, users even could disable the primitive types with a feature.

import feature ... hard-code ...

Yes, you can use the "import" feature.

For the codegen, you don't have to hard-code for primitive types, just treat them as types defined in another schema file.

Use build.rs to generate code in compile time.

crate molecule_std {
    // Re-export
    ... ...
    // Add primitive types
    mod primitives {
        include!(concat!(env!("OUT_DIR"), "... ...."));
    }
}

In codegen, treat primitive types as types defined in another schema file and just imported to current schema.

// From above section.
#[cfg(feature = "with-primitive-types")]
use molecule_std as molecule;

// Import primitive types as types defined in another schema file.
use molecule::{ ... ... }; // without `primitives::*`
#[cfg(feature = "with-primitive-types")]
use molecule::primitives::*;

p.s. The above are pseudo-code, the key word crate means define a crate.

p.s.s. My discord hangs on the plugins updating every time since yesterday, so continue our discussion here.

yangby-cryptape commented 1 year ago

An Off-of-Topic Topic

IMHO, I think we should split crates on the levels of necessity.

For example:

byte is required; we couldn't create these crates without byte.
All primitive types are enhancement part, they are just plugins or extensions.

They could be removed. Things still work.

A metaphor:

1 is the basic unit.
All other integers are 1 plus several 1s.

Even they are not defined in the book, you could create them in your solution when you do an exam.

That's why I think they should be in different crates.

Postscript

Yes, all CKB developers have to use all of these primitive types.

But, when I talk about these in above, I don't treat molecule as just a part of CKB:

we just add features which CKB required.

I treat it as a common open-source project (even it's not good enough, even it sucks, I still want to do my best):

users could choose how to encode or decode these primitive types,
and they could override the names we defined for our primitive types.

eval-exec commented 1 year ago

I'm so confused right now, I didn't understand what you mean. Please let me try to make everything clear.

Now I want to provide users with a molecule-std crate with the with-primitive-types feature, so that users can only use molecule-std in their own projects, and use the with-primitive-types feature to control whether to use primitive types.

Then, the build.rs of molecule-std will use codegen to generate primitive_types.mol into primitive_types.rs , this will happen inside molecule-std. And primitive_types.rs only needs structures and traits in the molecule.

We also need to let codegen support whether to skip the generation of primitive types or not.

Situation 1

When build.rs of molecule-std needs to generate primitive_types.rs, because primitive_types.rs only needs molecule, and primitive_types.rs cannot import molecule_std since primitive_types.rs is a mod in molecule_std. So I write this in codegen: (this will used by molecule-std's build.rs to generate primitive_types.rs) https://github.com/nervosnetwork/molecule/blob/3fb81c222084c53da35016164271258a597686f3/tools/codegen/src/generator/languages/rust/mod.rs#L43-L48

Situation 2

When a user create a custom molecule schema file dog.mol:

struct Dog {
    name: uint64,
}

He want to skip the generation of uint64 and compile dog.mol to dog.rs, he need to let codegen's generator generate a file which depend on molecule-std. So I write this in codegen: (this will be used by users who use molecule-std in their projects.)

https://github.com/nervosnetwork/molecule/blob/3fb81c222084c53da35016164271258a597686f3/tools/codegen/src/generator/languages/rust/mod.rs#L49-L55

Then he will got a file: dog.rs which imports section is

   use molecule_std as molecule;

   use molecule::prelude::*;

He use dog.rs in his own project, and his project depend on molecule-std with with-primitive-types feature enabled, then dog.rs will load these: https://github.com/nervosnetwork/molecule/blob/3fb81c222084c53da35016164271258a597686f3/std/rust/src/lib.rs#L1-L15 , because he has enabled the with-primitives-types feature, then molecule-std will load primitive_types.rs.

if we let users control primitive types in users' crates, the feature should in generated code.Generate all code, and users decide to enable which features.

I think this PR has made this, we compile all primtive types in primitive_types.mol to primitive_types.rs when user use molecule-std, And this PR give users with-primitive-types feature for molecule-std to enable primitive types or not.

if we control primitive types in codegen

I'm thinking on how to do this. But I think this is inconvenient for users.

@yangby-cryptape What do you expect this PR to be like?

yangby-cryptape commented 1 year ago

I said I gave you an incorrect suggestion, because when I read this piece of code, I didn't expand the whole block (the context) which this piece code in.

I told you how to use features here, but it not the point. In fact, the point is that you should not use features there.

In brief:

Do NOT add features for codegen itself, use an option.
Only add features to the generated code.

There are two usages for codegen:

Generate code in build.rs.
Generate code with compiler.

If you still be confused, try thinking about the compiler: users have to re-compile the compiler when they want to change the generated code (with or without the primitive types) if codegen uses features.

IMHO, this is the correct way: your previous idea.

yangby-cryptape commented 1 year ago

LGTM! I haven't found any obvious issues.

So, Could you mark this PR to "ready for review" state?

It seems that if it is a draft, I have to approve it for running tests every time after you push updates. But I couldn't do that on time every time.

If all tests are passed, I will approve it.

TODOs before released a new version, after this PR merged:

This PR requires two types of tests:

With primitive types.
Without primitive types.

You can reference the tests for no-std and std features.

eval-exec commented 1 year ago

Update: This PR should be rebased on #64 (syntax version feature)

eval-exec commented 1 year ago

Todo:

[x] Implement codegen to expand primitive types when syntax = 2
[ ] Add tests

nervosnetwork / molecule