Proposed RFC Feature Create JSON Schema for Prefabs

OBWANDO commented 3 years ago

Summary:

Currently there is a Prefab system for objects, but no documentation or schema published.

What is the relevance of this feature?

Proper documentation is necessary to allow the feature to expand and mature. There are many needs for the schema to be published to allow automation of DCC and dynamically generated content. As the Prefab system expands, having a properly defined and documented schema will allow developers to implement new features in a consistent manner.

Feature design description:

The schema should follow industry standards as defines here: https://json-schema.org/ and possibly be explored further with Docson.

What are the advantages of the feature?

Clear concise documentation on the schema will ensure the feature stays consistent and extensible.

What are the disadvantages of the feature?

None

How will this be implemented or integrated into the O3DE environment?

There is no real integration, however there may be the possibility of automated publication of the schema through code generation.

How will users learn this feature?

They will learn through the developer documentation site.

jeremyong commented 3 years ago

Having some sort of schema would be great. Two things do come to mind however. First, the spec stipulates some requirements with “MUST” semantics in RFC parlance. Would all such requirements necessarily be observed? For example, requiring all prefab json files to contain a $schema property with the URI of the schema itself. Second, how would versioning and validation work to incorporate the schema in the engine?

A broader question is whether the same approach is viable for other systems (e.g. render pass json file) and if some unification is possible.

OBWANDO commented 3 years ago

Good point, having the same schema system used for all json related features would be ideal. We could start with this thread and build upon a larger thread once we have a basic template that could be expanded upon in place.

It would be good for us to assemble some of the "Musts" in this thread to begin with.

jeremyong commented 3 years ago

FYI I had misremembered the $schema and $id rules as living on the described object and not just the schema itself, so these requirements aren't as important imo. I'll scan the spec soon to understand it better before commenting further.

Infurael commented 3 years ago

The Json Serialization and Object Stream serialization both use the Serialize Context as their reflection/meta data system. The "Serialize Context Tools" application has an option to export the content to a JSON document, which can be further processed through for instance a Python script and Jinja2. This can provide an intermediate solution to generating documentation for Prefabs an other documents generated through the Json Serialization. Note though that the Json Serialization supports custom serializer so information from those type may not show up fully in the Serialize Context. (@OBWANDO Ping me if you want an example of documents generated through this approach; they're on our internal network only though.)

A project to replace the Reflection system is in the early stages of design at Amazon. The goal is to allow definition of data to happen outside of code, with an option to generate C++ classes through AzCodeGen. This would feed into various systems such as the RPE and Json Serialization, with one of the ultimate goals to unlock support for Prefab features that are currently difficult or impossible to achieve. Formats are being explored, but it's JSON-Schema is not a top contender, partly because JSON-Schema is not a accepted and finalize format yet and support for it is sparse. This does however not exclude the ability to use the formats that are being explored from being exported to JSON-Schema. The goal however is to be able to generate documentation and the new format for reflection will keep support for this in mind.

jeremyong commented 3 years ago

If the direction is to leverage code gen more then the schema is less attractive for sure (at least, a hand-authored one). Not having used this much before, aside from documentation generators, are there other tools/viewers/etc. that leverage json-schema?

Infurael commented 3 years ago

The current approach to reflection, the Serialize Context, is exclusively code based. The idea behind the new Reflection system is to make it schema based. Among others this should provide paths for adding data to O3DE without requiring code compilation. With such a schema there are several ways it can be used, with one being to generate C++ code through AzCodeGen, but also allow generate documentation or conversions to other schemas. I am a little hesitant to go into too many details because the design is in the early stages and in my experience the final result can differ greatly from the initial ideas.

This page https://json-schema.org/implementations.html has a list of libraries and tools that support JSON-Schema.

HogJonny-AMZN commented 3 years ago

glTF: https://github.com/KhronosGroup/glTF/blob/master/specification/2.0/schema/glTF.schema.json "$schema": "http://json-schema.org/draft-04/schema"

All glTF object properties (see glTFProperty.schema.json) have an optional extensions object property that can contain new extension-specific properties. This allows extensions to extend any part of glTF, including geometry, materials, animations, etc. Extensions can also introduce new parameter semantics, reserved IDs, and new formats containing glTF.

What is the actual outcome driven goal and value? To me I think it is a lot more then just standardization, consistency and documentation. The big goal IMO is efficiency and hooks to scale production.

Currently a single o3de prefab serialized to our JSON format that just defines a single entity, with a mesh component and a model ref, plus a material component with a few material asset refs is 700+ lines because everything is serialized including the editor state for the prefab. It takes 138 lines to only describe a Entity+MeshComponent+Model asset alone.

I think I would want to ... Define the structure and nature of a thing in one file. The state and other extraneous data concerned with that thing in another (or optionally not at all)

From the standpoint of a scripting or a data pipeline that couples well in a python ecosystem of tools, our format is difficult to deal with and high friction because unless you search the code base (Component IDs) and use the more technical aspects of asset processor (asset IDs) there isn't really even a way to understand data just by looking at it.

Is any amount of the prefab data optional or lazy, and could be omitted in a prefab construct generated upstream out of system? (I haven't actually tried that out yet, but I recall talking to @mbalfour-amzn about these things early on)

My interpretation of the ask (and biased with things several customers tried/wanted to do with slices), is that we have some kind of schema based pre-flight / upstream prefab representation that is more lightweight, easier to grasp, and well documented, etc.

The structured JSON of the Atom .materialType and .material files (which are schema inspired, and I think could be easily adapted to json-schema) is a good example of the benefits that come along:

easy to debug and troubleshoot
easy to edit or generate
easy to parse/query

I think some related goals would be to improve macro iteration cycles by improving upstream world and other data generation (scaffold and synthesize worlds procedurally), enhance in system use of the data (friendlier to less technical python users), and facilitate out of system use and multiple round-trip cycles.

I think we could improve some of these aspects without a schema, like having a way at configure or build time to generate a python friendly DB that provides component ID lookup by name, etc. and could be used with pure python data pipelines (outside the editor) But it's still much easier to understand an explicit name. But if someone did have that DB a prefab API could would wrap it.

An RPC into the Asset Processor to retrieve a ID by asset path would also be nice. But it would still be easiest for a tech artist to just provide the known relative path (they are already accustomed to working with source asset paths upstream). But again this might be a piece of pure python prefab API (external to the editor.)

HogJonny-AMZN commented 3 years ago

These 36 lines are all I would need mentally in a preflight schema to describe a simple mesh object prefab, pretty much everything else could be determined/derived - having to understand it, specify it and fill in the rest (the other 100 lines) is friction.

{
    "ContainerEntity": {
        "Id": "ContainerEntity",
        "Name": "Shader_Ball",
        },
    },
    "Entities": {
        "Entity": {
            "Name": "Shader Ball",
            "Components": {
                "Component": {
                    "$type": "TransformComponent",
                    "Parent Entity": "ContainerEntity",
                    "Cached World Transform": {
                        "Rotation": [
                            0.0008726645028218627,
                            0.0,
                            0.9999996423721314,
                            0.0
                        ]
                    },
                },
                "Component": {
                    "$type": "AZ::Render::EditorMeshComponent",
                    "Controller": {
                        "Configuration": {
                            "ModelAsset": {
                                "assetHint": "objects/shaderball/shaderball_default_1m.azmodel"
                            }
                        }
                    }
                }
            }
        }
    }
}

HogJonny-AMZN commented 3 years ago

Among others this should provide paths for adding data to O3DE without requiring code compilation.

wonderful goal overall.

mbalfour-amzn commented 3 years ago

What is the actual outcome driven goal and value? To me I think it is a lot more then just standardization, consistency and documentation. The big goal IMO is efficiency and hooks to scale production.

IMO, this RFC does already directly state the goal and value - "As the Prefab system expands, having a properly defined and documented schema will allow developers to implement new features in a consistent manner." @HogJonny-AMZN - Your big goals are also good ones to shoot for, but not specifically what this RFC appears to be about. Those seem like separate RFCs. Dissecting your asks a bit, I think you're looking for the following:

A simple way to initially generate a prefab. I think you're saying that it's acceptable for it to become complicated during or after generation though, just not in what's initially written? This is an important clarification because it affects the complexity of every system that uses a prefab if it needs to remain simple after initial generation.
Helper tools for prefab generation to query for the difficult pieces of data (asset IDs, etc). These could be used as a part of a generation process to take the "easily-described" version of a prefab and turn it into a final usable version.

Maybe you should write up an RFC for those? Those sound like great ideas.

BTW, I assume you'd also want documentation of the schema. It would be hard to know about the field "Cached World Transform" in the example above without documentation. ;) So, coming back to the RFC itself, since one of the goals of the prefab design is to have a more human-readable and human-editable format, some form of documentation is pretty much a requirement to meet that goal. Otherwise, there's no easy way for humans to know what's even valid to write in the file.

Formats are being explored, but it's JSON-Schema is not a top contender, partly because JSON-Schema is not a accepted and finalize format yet and support for it is sparse.

@AMZN-koppersr The one counterpoint I'd raise here is that if VS Code has support for validation and auto-completion from a JSON schema, then I think there would be significant value in having a schema available. I don't know for a fact that VS code does support that, but the json-schema web page claims it does.

In general, if there is a documentation format that supports validation and auto-completion in editing tools, I would be much more in support of that format. If not, then I would rather see a documentation format that's easy for humans to read and search. In the context of json-schema, if it supports validation and auto-completion then it sounds great to me, but if it doesn't it would be (IMO) a miserable format to have to manually read as documentation vs something like auto-generated wiki pages.

mbalfour-amzn commented 3 years ago

Is any amount of the prefab data optional or lazy, and could be omitted in a prefab construct generated upstream out of system? (I haven't actually tried that out yet, but I recall talking to @mbalfour-amzn about these things early on)

Yep, there is a bunch of data that can be omitted when first writing a prefab. Anything that's omitted just gets a default value. However, I'd like to draw a distinction between what's initially written in a prefab vs what's saved in a prefab. As soon as you load a prefab into the Editor and save it back out, it will potentially get more Editor state data written into it, data might be saved out in different formats, etc. For example, Color accepts a lot of different formats as input when read in, but it's only going to write out one format, which may be different than the input format.

I'm bringing this up because generation or authoring or a prefab will be a different experience than iterating on an existing prefab.

Infurael commented 3 years ago

Formats are being explored, but it's JSON-Schema is not a top contender, partly because JSON-Schema is not a accepted and finalize format yet and support for it is sparse.

@AMZN-koppersr The one counterpoint I'd raise here is that if VS Code has support for validation and auto-completion from a JSON schema, then I think there would be significant value in having a schema available. I don't know for a fact that VS code does support that, but the json-schema web page claims it does.

In general, if there is a documentation format that supports validation and auto-completion in editing tools, I would be much more in support of that format. If not, then I would rather see a documentation format that's easy for humans to read and search. In the context of json-schema, if it supports validation and auto-completion then it sounds great to me, but if it doesn't it would be (IMO) a miserable format to have to manually read as documentation vs something like auto-generated wiki pages.

@mbalfour-amzn The schema itself is likely going to be a custom one to accommodate not just Prefabs but other system like Networking as well (possibly, still exploring this). The language is leaning towards XML because support is more mature, like VS supporting XSD to allow verifying that objects adhere to the schema and providing auto-complete functionality. XML is also the language that AzCodeGen is gravitating towards the most even though JSON is also supported. None of these choices should prevent building a tool that converts our schema to JSON-Schema if needed. Again though, early days.

Infurael commented 3 years ago

Currently a single o3de prefab serialized to our JSON format that just defines a single entity, with a mesh component and a model ref, plus a material component with a few material asset refs is 700+ lines because everything is serialized including the editor state for the prefab. It takes 138 lines to only describe a Entity+MeshComponent+Model asset alone.

@HogJonny-AMZN A cleanup pass is still needed for Prefabs. Much of what is currently being stored to the Prefabs is inherited from Slices, although a bit of cleanup has already happened for instance on the TransformComponent. This isn't limited to removing redundant information, but also to make the remaining data more human-friendly. This is likely to be part of the work that will be done for overrides as that will require touching the components themselves. In the meantime owners of components can already see how their data is stored in a Prefab and start their own cleanup by removing redundant data and making the existing data more user-friendly. While this won't fix everything, it is a baby step towards the simple format you've posted.

jeremyong commented 3 years ago

One additional data point: there are some benefits to using XML, in particular, docstrings, comments, string literals, etc.

When I supported a code-gen based injection tool in the past, I initially used JSON but changed to XML because of the need to support special characters and simplicity when dealing with whitespace. While XML is more difficult to write, it's still fairly easy to read (and more importantly, diff). The question is, should we be optimizing for the reader, or the writer in this case. Not knowing enough about the system or the roadmap, I'll stop commenting here.

Infurael commented 3 years ago

@jeremyong Longer term a generalized DOM would be good so data can be retrieved from multiple formats such as JSON, XML, YAML, etc. Each of these formats have their pros/cons so giving users the ability to pick the format they feel works best would be better than sticking with a single format. It's all about providing users with the tools to be build the optimal workflow.

o3de / sig-core