Open zth opened 5 months ago
One idea for the case where there are additional inputs that should control whether something is regenerated or not (like with GraphQL where ideally both the actual GraphQL text input, and the source schema should control whether things are regenerated) - let people define additional input(s) that the build system can take into account when writing the hash:
{
"embeds": {
"generators": ["pgtyped-rescript/embed", {"embed": "rescript-graphql-generator/embed", "additionalInputs": "./schema.graphql"}],
"artifactFolder": "./src/__generated__"
}
}
The build system can then track and hash that file as well, and use the hash of that file in addition to the source hash when comparing whether things need to be regenerated or not.
@zth -- I've enabled wiki's for the project so we can move these sort of 'permanent' issues (that we want to keep around for documentation) to there. Would you like me to move it over? I think you can do that as well as you're an author 👌
This is a WIP discussion for implementing generators support in the style of https://github.com/zth/rescript-embed-lang natively in rewatch and the compiler itself.
Relevant compiler PR: https://github.com/rescript-lang/rescript-compiler/pull/6823. That PR does the following in the compiler:
bsc
output an.embeds
file together with the.ast
file, if the file processed has embeds. It'll also print1
to stdout if it found embeds. More about.embeds
and its format later.Generators and embeds are used a bit interchangeably in the text below. Generators are the program that generates code from some source input. Embeds is that source input embedded into ReScript source itself.
Configuring generators in the consuming project
We need a way to configure what generators to use, so the build system knows what to run for each embed. This should be done in
rescript.json
for consistency.Suggestion: Like PPXes, point to a path
In this alternative, you point to a path. That path should be some sort of configuration file, that the build system can read once, and figure out what it needs for what generator this is, and how to run it. Example:
rescript.json
in the consuming project.Example
embed.json
in thepgtyped-rescript
package:We'll go more into how to build generators later, but the build system would expect to be able to send some configuration as arg to that
command
and have it generate from that config.Note that the command could be any type of binary. It's
bun
here but it could benode
, or a Rust/OCaml/whatever binary. Doesn't matter. It's up to the user to have what's needed installed on its system to be able to run the generation.This leaves us room to add more configuration if wanted, as well as give good DX with minimal manual work.
So, to recap what the build system would do:
embeds
inrescript.json
.json
if it's not already in the file pathIt now knows what generator this is, how to run it, and what tags to run it for.
Configuring where to emit the generated content
I think we should force the user to configure a central place where to emit generated files, like
./src/__generated__
. This will simplify a lot, and scale well up to the point where there's so many files in the same folder that you start to get perf issues. At which point we can solve that in a number of ways.A proposed config could look like this:
We need to check that that folder is inside of a configured ReScript source folder etc, but that should be fine.
Questions and things to figure out
Overview of potential setup in build system
Here's an overview of how the build system could handle running generators.
This is how it looks at a high level:
Finding embeds
You can embed other languages or any string content into tags inside of ReScript. Example:
If there's a generator configured for
sql.one
,bsc
will spit out a.embeds
file next to.ast
when it's asked to produce the.ast
file. It looks roughly like this (format very much subject to change, we'll make it whatever makes most sense and is easiest/most efficient to read from the build system):If
bsc
found embeds and printed a.embed
file, it'll output1
to stdout.Running generators
Now, if we found embeds we'll want to run the appropriate generator for that file, if the embedded content has changed.
So, we load the
.embeds
file, go through each of the embeds, and check whether they've already been generated. If they've been generated, we check if the generated content was generated from the same input, via a comment with a hash of the source content at the top of the generated file. If the generated file wasn't generated from the same source, or if it hasn't been generated yet, we run the appropriate generator and write the generated file.Here's a number of hands on examples:
First time a generation runs
bsc
extracts 2 embeds fromSomeFile.res
and prints1
to stdout to signify thatSomeFile.embeds
file generated bybsc
, and figures out that 2 files are to be generated:src/__generated__/SomeFile__sql_one__M1.res
andsrc/__generated__/SomeFile__sql_many__M1.res
. Notice the file format<sourceModuleName>__<tagName.replace(".", "_")>__M<indexOfTagInFile>
. If multiple embeds of the same tag exists in the same file (multiple%sql.one
for example), theM
part is incremented, likesrc/__generated__/SomeFile__sql_one__M2.res
for the next embed./command/to/run/generator '{"tag":"sql.one","content":"select * from users where id = :id!","loc":{"start":{"line":1,"col":23},"end":{"line":1,"col":60}}}'
. This can all be done in parallell, since the generators should be idempotent (at least to start with).src/__generated__/SomeFile__sql_one__M1.res
src/__generated__/SomeFile__sql_one__*.res
andsrc/__generated__/SomeFile__sql_many__*.res
and then remove any of them that aren't in use any more. This also needs to be updated in the build state.When generated content hasn't changed
The same setup as the first example, up until point 3, where instead:
src/__generated__/SomeFile__sql_one__M1.res
andsrc/__generated__/SomeFile__sql_many__M1.res
@sourceHash
.embeds
file.When generated content has changed
The same setup as above, but from point 5:
Cleaning up
We'll need to continuously ensure that we clean up:
.embeds
files when there aren't any embeds anymore (as notced bybsc
not writing1
to stdout)When errors in generation happen
We can flesh this out more, but ideally, when errors in generation happen, we can propagate those to the build system and have the build system both fail and write them to
.compiler.log
so that they end up in the editor tooling.The one thing to take care of here is to translate the error locations so that the generator can return errors relative to the content it received, whereas the error itself is presented by the build system and in the editor tooling offset to the correct location in the source file.
Regenerating content?
The idea is that you can simply remove the generated file, at which point it'll be regenerated the next time the build system processes the file with the source content.
Questions and thoughts