Open zth opened 1 year ago
I'm all for codegen.
Just to be sure to understand, bsb today supports generators on files, basically this would allow generator on inline strings?
The generator runs and outputs
UserResolvers__sql.res
Maybe it should output UserResolvers__sql__Query.res
to distinguish between multiple calls inside the same file.
Just to be sure to understand, bsb today supports generators on files, basically this would allow generator on inline strings?
Exactly, so what decides what triggers generation is the source code, not static build configuration only in the consumer project. Plus each generator is installable, and controls what files it generates. And also that the compiler replaces the source code with a reference to the generated module.
The main idea is to make this feel as seamless as possible. You shouldn't have to think about generation etc, it should feel like a language feature for embedded code.
Maybe it should output
UserResolvers__sql__Query.res
to distinguish between multiple calls inside the same file.
Yes, this is something we'll need to solve in a good way. I have some prior art in the EdgeDB experimentation I did.
Thanks for the super detailed RFC @zth. Very keen to see this be more integrated in the ecosystem 🎉 . I need to ponder this a bit more. Some initial thoughts:
%decco(%sql("select ... ;"))
). Perhaps not even recursively, but by setting an explicit order of operations through the config or something. This is something I think we need to have some restrictions on / clear design from the get-go.sql("some-query")
we call it with query, filename
, and simply replace the sql
ourselves. This will require some parsing on our end, especially when we have nested operations like above, but it will allow us to chain operations, make changes in-memory and only write once. We could still cache by ast-nodes (which would be nicer - because a change in white-space, or format would otherwise still trigger a re-generation)Anywho - just some thoughts. Again. Exciting stuff 🙌
Thank you for your reply @rolandpeelen !
A quick reply, mostly in response to point 1 and 2:
The simplest and likely most performant way is to scan each file as text, with a simple regexp or similar for %gen.<generatorName>(<payload>)
. We'd need to account for multiple generators in multiple places in a single file, etc. So there's a bit of complexity to figure out, but I'm fairly positive we can get away with just scanning the text. That's generally performant, but it does of course introduce overhead.
As a benchmark, the Relay compiler reads all target files in the source project (.res
in ReScript, .js
in JavaScript, etc) as text and looks for GraphQL tags to extract, and the Relay compiler is very performant.
It's also worth separating dev (watch) mode and running a single build. In a single build this would obviously need to be sequential, but in the dev mode case, this could all be fully parallell to the regular compiler process, meaning that reading the file contents wouldn't need to be scanned in the main compiler flow. Which in turn means it would get out of the way of the regular compilation process.
The effect would be generators causing recompiles whenever they write or change files, but that should be a pretty minor thing. Especially if you combine it with committing generated files.
What makes me confident we can make this work is I've already used this type of workflow for several years with the RescriptRelay compiler, and it works really well.
Happy to discuss more! Performance is really the key here.
I guess it could be useful to also compare it to the current PPX approach, although they're different in what they're trying to solve.
No PPX:es - no preprocessing - fast. For each PPX, there's blocking preprocessing regardless of if the file has things for the PPX or not. Deep copy of the AST to each PPX, wait for the PPX to finish (must run in sequence), and so on.
No generators - no preprocessing - fast. For each generator, there's the (potentially slight) performance penalty of scanning the text (not deep copy of the AST) for generators. All generators are found in one pass.
Some thoughts after a chat with @jfrolich
We can use the PPX to not only replace, but also extract, and append to a file. Then subsequently use that file for the generation in between parsing / compiling. Then we won't have to scan all the files manually, just files with some post-fix that we've marked somewhere.
The extraction process could also include some parse data, like the type that it's working on (for instance when trying to generate a decoder or something like that), or some arguments. Basically a very simple AST.
@zth -- Shall we move this content / condense it into a wiki?
@zth -- Shall we move this content / condense it into a wiki?
Sounds good!
This is a brain dump of an idea I've had for a long time around how we can make codegen a first class citizen that's easy to use and orchestrate in ReScript. I'm posting this here in the Rewatch repo because exploring it involves changes to the build system, and Rewatch looks like a great place to try that type of changes.
Summary
Proposing first class support for code generation in the ReScript build system and compiler. This can enable easily embedding other languages directly in your code. SQL, EdgeQL, GraphQL, markdown, CSS - anything really. Generators can be written in any language, and the build system will take care of everything from when to trigger the generators most efficiently, to managing the generated files from each generator (regenerate, delete, etc).
Here's a quick pseudo example of how this idea could work for embedding other languages, implementing a type safe SQL code generator:
Let's break down at a high level how this pseudo example could work.
UserResolvers.res
before it compiles it, and sees that it has%gen.sql
. It looks for a generator registered under thesql
name.sql
generator and calls it with some data including the file name, the string inside of%gen.sql()
, and a few other things that can help with codegen. The generator in this example will leverage information from a connected SQL database to type the query fed to it, and generate a simple function to execute the query. Since the generator is responsible for emitting an actual.res
file and not rewrite an AST, it can be written in any language, as long as we can call it and feed it data via stdin.UserResolvers__sql.res
. The build system knows this and now handlesUserResolvers__sql.res
as a dependency, meaning it knows when to clean up the generated file, and so on.module Query = %gen.sql
part intomodule Query = UserResolvers__sql
. A very simple heuristics-based swap from the embedded code definition to the module its generator generates, powered by rules around how to name files emitted by generators.Generation will be easily cacheable, since regeneration of the files is separate from the compiler running. This means that the build system and the generator in tandem decides when to regenerate code. And this in turn means that you pay the cost of code generation only when the source code for the generation itself changes.
There's of course a lot of subtlety and detail to how to make this work well, be performant, and so on. But the gist is the above. I'll detail with more examples later.
Goals
The idea behind this is that codegen is a fairly simple tool that's efficient in many use cases, but is too inaccessible right now. In order to do codegen today, you need to either write a PPX, or for separate codegen have:
With the approach to codegen outlined above, you'll instead need:
...and that's it. The ReScript compiler and build system handles the rest.
Concerns
Performance
Performance is king. We need to be very mindful to keep build performance as fast as possible. This includes intelligent cacheing etc, but also setting up good starter projects for building performant generators.
We can of course ask users to write generators in performant languages like Rust and OCaml. But, one strength of this proposal is that you should be able to write generators in JS and ReScript directly. This has several benefits:
In order to make the JS route as performant as possible, we can for example recommend using https://bun.sh/, a JS runtime with fast startup, and include tips on how to keep Bun startup performance fast.
As for the design of the generators themselves, they can hopefully be designed in a way so that they can:
Tooling (LSP, syntax highlighting, etc)
Embedding languages in other languages is a pretty common practice. For example, we already have both graphql-ppx and RescriptRelay embedding GraphQL in ReScript. So for tooling, it's a matter of adjusting whatever tooling already exists to be able to understand embedded code in ReScript.
Error reporting
In an ideal world, code generators can emit build errors that the build system picks up, and by extension reports to the user via the editor tooling. This would be the absolute best solution, if codegen errors are picked up and treated like any compiler error.
Future and ideas
Here are some loose ideas and thoughts:
%gen.sql
as example is above) or by fully separate files (.gql
,.sql
, etc).%gen.<generator>
, and pass a representation of that AST to generators.Use case examples
Not sure we actually want to encourage all of these, but just to show capabilities.
Embedding EdgeDB
I did an experiment a while back for embedding EdgeDB inside of ReScript: https://twitter.com/___zth___/status/1666907067192320000
That experiment would fit great with this approach:
%gen.edgedb
.Embedding GraphQL
The same goes for GraphQL. For those who don't want to use a PPX-based solution, it'd be easy to build a generator (something similar to https://the-guild.dev/graphql/codegen perhaps) that just emits ReScript types and helpers.
Type providers: OpenAPI clients
F# has a concept of "type providers": https://learn.microsoft.com/en-us/dotnet/fsharp/tutorials/type-providers/ We could do something similar with this approach.
Imagine you have a URL to an open API specification. We'll take GitHub's as example: https://raw.githubusercontent.com/github/rest-api-description/main/descriptions/ghes-3.9/ghes-3.9.json
Now, imagine there's a generator for turning an OpenAPI spec into a ReScript client, ready to use. We could write a generator to hook up that OpenAPI generator:
Roll your own simple CSS modules
You could use this to roll your own simple CSS modules.
Imagine a code generator registered for
gen.cssModules
.The code generator is called with the CSS string above, and relevant meta data. It reads the CSS using standard CSS tooling, and just like CSS modules it hashes each class name based on the file name it's defined in, plus the local class name. It then outputs two files:
And, the original file after it's transformed by the internal compiler PPX for the code gen:
There, we've reinvented a small version of CSS modules, but fully integrated into the ReScript compiler.
Next step: a PoC
There's a lot to explore and talk about if there's interest in this route. A good next step would be to pick one simple generator, and PoC how it could look integrating it into the build system. @jfrolich we talked about this briefly.
If there's interest from you to explore this further, we could set up a simple spec of what needs to happen where to explore this further. What do you say?