randomPoison commented 5 years ago

The current method for code generation involves generating the binding code for all schema-defined types in a single Rust source file. While this approach is fairly simple, it has the major drawback of making it difficult (or impossible) to distribute schema files and schema-generated code as part of a crate. For example, the SDK can't provide any useful method or trait implementations for the improbable::Position component, since the component effectively doesn't exist until the end-user generates it.

An alternative approach would be for each crate to distribute the generated code associated with any schema types it defines. This would allow the crate to also provide useful inherent/trait impls for such types. The SDK would distribute the standard schema library.

Unanswered Questions

[x] How do crates distribute the schema files themselves (i.e. is it even possible to distribute non-Rust assets in a crate)?
- You can distribute arbitrary additional files in a crate.
[x] How can we have the schema compiler find schema files pulled in from third-party crates?
- cargo_metadata provides a way to list all crates being built as part of a project. We can search all dependencies for schema files to include.

randomPoison commented 5 years ago

One part of this might be to allow for code generation to happen as part of the build.rs script. Right now, users have to manually execute a command in order to perform code generation, which means if they first open their project without doing so, they'll end up with a bunch of compiler errors. If code generation happened as part of the build script, it would ensure that users could always run cargo build on their worker project without needing to remember any other manual steps.

jamiebrynes7 commented 5 years ago

For a local workflow, having the generated code in its own crate might be worth looking at.

This is a common pattern for C# workers where you have csproj that is just generated code (and that on build you run codegen) and another which is your worker code that depends on the csproj. This solves a few problems:

You get IDE features working out of the box for worker code.
You avoid recompiling generated code whenever you change worker code.

However it does incur some overhead -

Change schema = need to remember to rebuild the code gen crate.

How this translates into shipping schema with a crate, I'm not sure. Say a user publishes a crate with some game functionality "spatialos-overwatch-movement", it could have a dependency on "spatialos-overmatch-movement-gen-code" which contains the generated Rust code.

The obvious limitation is that you cannot reuse the schema defined in the crate. You could ship schema with the generated code crate. I.e. -

spatialos-overmatch-movement-gen-code
---
 |- schema/
 |- src/
 |- Cargo.toml

But I'm not sure how a user could access this schema.

randomPoison commented 5 years ago

Giving it some more though, I think we should be able to distribute schema files via crates.io:

When publishing a crate, you can include arbitrary extra files in addition to Rust source files.
Cargo provides a way to get information about all crates being built as part of a project, including dependencies, via the cargo metadata command. There's also a cargo_metadata crate that makes it easier to access that data programmatically. We can use this information to search the dependencies tree for additional schema files to include in schema compilation.

So, in your example, the user could ship a spatialos-overwatch-schema crate, or they could even just bundle the schemas and the worker code into the spatialos-overwatch-movement crate. I'm not sure what's the best practice there, and I think there are additional questions to be answered about distributing entire workers via crates.io, though I think it should be possible from a technical perspective.

randomPoison commented 5 years ago

72 brings up an interesting problem for this approach to code generation: We'll need to handle schema files that reference external types. This needs to be addressed in two ways:

For schema compilation, we need to find all the schema files in the crates dependencies and make sure they're included in schema compilation. I had already recognized the need to do this for the final binary in order to generate the final schema descriptor used for the deployment, however every dependency crate will also need to do schema compilation for all of its dependencies in order to do code generation. These intermediate compilations will only need to spit out the AST bundle, though, since that's all that's needed for code generation.
The code generator will need to know the names of the dependency crates and where in the crate's module hierarchy to find the generated code. This is partially covered already by the process for finding schema files in dependency crates, but we might have to add some information to the Spatial.toml file so that downstream crates can find the generated code for a given crate.

randomPoison commented 5 years ago

Another thing to consider: How do we allow for custom code generation (e.g. for GDKs to have custom code generation) if code generation is baked into each crate that provides schemas? In the current version where code generation is centralized in the final consumer's crate, it's much easier to swap out the code generator implementation.

It might be possible to do some magic with environment variables and build parameters to have the root crate inject codegen information into its dependencies, but would that be a good idea? It seems a bit icky to modify the code in a dependency based on who's consuming it. On the other hand, that's exactly what feature flags do (albeit in a more controlled way).

randomPoison commented 5 years ago

117 may provide the answer to the previous question: If we provide some mechanism (maybe WASM-based) to safely inject custom code generation logic, upstream crates can also potentially load the plugin when generating code for schema types.

Downstream crates changing how upstream crates compile is its own potentially hairy thing to deal with, but it at least sounds like a viable direction for investigation!

jamiebrynes7 / spatialos-sdk-rs

Only generate schema bindings for the local crate #56

Unanswered Questions

72 brings up an interesting problem for this approach to code generation: We'll need to handle schema files that reference external types. This needs to be addressed in two ways:

117 may provide the answer to the previous question: If we provide some mechanism (maybe WASM-based) to safely inject custom code generation logic, upstream crates can also potentially load the plugin when generating code for schema types.