tomvandig / eccg

Entity Component Composition Graph
0 stars 1 forks source link

Comments TK #1

Open aothms opened 7 months ago

aothms commented 7 months ago

I'm super content seeing this written down in a way like this. Here are some discussion points:

referencing virtual entities

I think this is a really interesting idea, but I'm not convinced it's necessary. Also I have some doubts about how well this would work with shared ownership where you don't necessarily own the full path, or intermediate nodes in that path get deleted?

What this provides is:

graph TD;

    A-.->B;
    C-.->B;

    A.B-->Property1
    C.B-->Property2

    A-->Location;
    B-->Geometry;

    style A fill:#63c408
    style B fill:#63c408
    style C fill:#63c408
    style A.B fill:#63c408,stroke-dasharray: 5 5,stroke-width:4px
    style C.B fill:#63c408,stroke-dasharray: 5 5,stroke-width:4px

    style Geometry fill:#08aec4
    style Location fill:#08aec4
    style Property1 fill:#08aec4
    style Property2 fill:#08aec4

But in my mind it's equivalent to

graph TD;

    A-.->B;
    C-.->B;

    A-->Property1
    C-->Property2

    A-->Location;
    B-->Geometry;

    style A fill:#63c408
    style B fill:#63c408
    style C fill:#63c408

    style Geometry fill:#08aec4
    style Location fill:#08aec4
    style Property1 fill:#08aec4
    style Property2 fill:#08aec4

The property then is not associated to B (in the context of A and C respectively) but to either A or C directly. But don't we say that's semantically equivalent due to the composing behaviour of the typing relationship?

which entities are "products"

In the IFC4 (and ECS without typing) it's trivial to query the model for all products (things to display in the 3d view).

Considering a diagram like this:

graph TD;

    Window1:::E-.->WindowType;
    Window2:::E-.->WindowType;

    WindowType:::E-->Assembly:::C;
    Assembly-->Frame:::E;
    Assembly-->Glazing:::E;

    Frame-->Placement1:::C;
    Frame-->Assembly2:::C;
    Assembly2-->Bar1:::E;
    Assembly2-->Bar2:::E;
    Bar1-.->BarType:::E;
    Bar2-.->BarType:::E;
    Bar1-->Placement6:::C;
    Bar2-->Placement7:::C;
    BarType-->Geometry4:::C;
    Glazing-->Placement2:::C;
    Glazing-->Geometry2:::C;

    Wall:::E-->Placement3:::C;
    Wall-->Geometry3:::C;

    Wall-->Contains:::C;
    Contains-->Window1:::E;
    Contains-->Window2:::E;

    Window1-->Placement4:::C;
    Window2-->Placement5:::C;

    classDef E fill:#63c408
    classDef C fill:#08aec4

How do we know the elements to display are Wall Window1.Glazing Window2.Glazing Window1.Frame.Bar1 Window1.Frame.Bar2 Window2.Frame.Bar1 Window2.Frame.Bar2.

In my implementation I did:

But does this wouldn't capture all possibilities to define a product?

~If you compose/flatten before evaluating how can you still construct an id path to uniquely identify instanced geometry?~

edit: this is not really a thing. the combination of assembly+typing causes subcomponent ids not being able to uniquely instantiate an object, but the evaluating the implications of typing pre or post querying doesn't affect this.

evaluation of the broken placement tree

As identified in the discussions, we agreed on definition relative positions as the primary information carrier on placements. But in the cases of typing we can't do that because the type can't relate back to its instance placements.

What I did in my implementations is:

This requires some effort in documenting.

Can typing still be a component relationship?

Entity -> TypingComponent -> Entity instead of Entity -> Entity?

Because: How would you even express Entity -> Entity relationships, since components are the only data being exchanged.

Although I think I also understand that making it more "special" might help in understanding and implementation.

3 levels of schema

What I like a lot about the formalization of the composition described here is that we can provide three levels of schema, which greatly unifies what we currently do in IFC4!

The last one requires some elaboration, but basically it completely eliminates the need for a separate technology that we used mvdXML for in the past.

Because if we have a schema on the composed graph we can use the kind of structural typing + data variants from e.g jsonschema to say things like:

(This might require openapi's schema instead of basic jsonschema to have a discriminator field, i.e to map a type on a object based on the valuation of a field, classification in this case).

These layered kind of schemas don't need to have the same kind of severity. The last layer anyway can only be validated post composition so it doesn't really prevent import, but having these kind of usage patterns as first class entities in a proper schema language is potentially really powerful.

tomvandig commented 7 months ago

Hi TK!

Thanks for the detailed feedback! I'll try to respond to all your questions with my take on it. Please keep in mind that for many of your questions I am myself still figuring out if and why this new system helps, so bear with me if I contradict myself or something is incorrect.

The end result of this approach should be that we can do typing as in your example, but with a more natural query style from the client, and some extra features that simplify instancing. Whether that is achieved is something I hope to find out in these discussions.

referencing virtual entities

Yes! These would be semantically equivalent to the reader (or almost equivalent, the reader can tell that the component origin is different). The point is not to do this arbitrarily (as in the example I made) but to allow for special behavior: overrides and specializations. This feature should NOT be overused, I hope I can make clear when to use this feature by my other answers below.

which entities are "products"

The point of this system is to eliminate the reference chasing on the client that appears when we treat typing as a relationship. I think a point of clarification that I have not written down yet is that upon querying you will not just get your queried components flattened into your entity but you will still be able to reconstruct the graph that resulted in your query results. I.E you would know that a geometry component came from your WindowType.Glazing because it will tell you when its returned even though the Geometry itself is composed on Glazing directly.

The goal of the more natural query style is for the type to classify its subparts not as a type but as a concrete Window or Wall. When querying which entities are "products" we can query which entities have a classification that derives from product, and follow the compose graph upwards for those entities to find the terminal entities, constructing the entity path in the process.

e.g. in the example above querying for classification Glazing would return the following entity paths: [Glazing, WindowType.Glazing, Window1.WindowType.Glazing, Window2.WindowType.Glazing]

We need to be careful of a couple things when looking at that example:

  1. The composition terminates on the individual entity level, and so the assembly with the wall is (same as in your example) NOT part of the composition graph. This allows us to find the terminals we care about: Window1 and Window2.
  2. Glazing must not just be in an assembly with WindowType as in the example, it must both be composed and in an assembly (Or optionally the assembly relationship can be implied by the composition), otherwise it would contradict 1
  3. Some of the elements do not need to be shown. For instance, WindowType.Glazing should be classified as a type to make it clear its not itself a useful entity, and Glazing itself could be shown but will not be properly placed. This ties back into 1 and 2 and makes it clear that we care about the "terminals" of the compose graph, i.e the nodes without parents.

Having the above system in place, we can query for Glazing, receive Window1.WindowType.Glazing, and have the tools to instantiate that specific Glazing using the entity path as an identifier without traversing the compose graph on the client.

This is a lot of typing, but I hope this clarifies my thinking a bit. To be sure we would need an implementation to judge it properly.

evaluation of the broken placement tree

Using virtual entities, the type can relate back to its instance! That's one of the places where the virtual entities having components gives us some nice extra power. In short, what we would do would be Compose(Wall1.WallType, Placement) where Placement would be a relative placement relating back to Wall1. WallType itself is unaffected, only Wall1.WallType is affected. This is how Window1.WindowType.Glazing can be evaluated by itself in the example above, and still end up with the right placement.

Whether this is a good idea remains to be seen, but it looks nice on paper.

Can typing still be a component relationship?

Yes and no. The point of this is to avoid using the word Type in the first place when we really just mean data composition. It's absolutely possible to also define components with relationships to other entities, and this is necessary either way to support many other types of relationships that are not types (assemblies/groups). The point though is to avoid typing through relationships also, if that typing is meant to provide some kind of data hierarchy for an individual entity.

In terms of communicating the compositions of entities, indeed we would also need to communicate those on top of the components, that's an extra step added by using this system.

3 levels of schema

I like this idea! As always I have a hard time coming up with concrete examples, but I like this a lot. I think this ties in to what @gschleusner1972 calls Archetypes as well, where you want an entity to have a set of components so you can reason about it in specific ways during automation.

daviddekoning commented 7 months ago

Hi @aothms and @tomvandig,

I hope you don't mind me jumping into this thread. I've been exploring these ideas as well and thought I'd share some thoughts.

I recently spent a few hours getting into the details of how USD operates and it looks like the ideas here are moving very close to USD's model (i.e. the concept of composition is very central to USD). I am not proposing the adoption of USD - I have looked at it several times in the past, and this week the whole system finally clicked and it seems quite relevant to these discussions.

Referencing Virtual Entities

In USD, there is a clear demarcation between the composed object graph and the un-composed data that makes it up. To draw a parallel, when you are querying, reading or rendering a USD scene, you are always working with the 'virtual' objects, which they call Prims. The USD library presents a composed Scene that contains a graph of Prims. The prims have metadata, attributes and properties, and can contain other prims. They are presented as traditional objects.

However, you cannot author a prim or save a scene to a file. All writing is done on Layer, which contains PrimSpecs. The strength of USD as a format is how it can compose many different layers (all in different files) into a single scene that it presents to a user, all without necessarily reading each of those layers into memory.

Of course, you can save a scene into a single file, but this is basically just a flattening operation: the scene gets saved to a single layer and each Prim gets saved to a single PrimSpec. You lose all the information about how the objects were composed.

In this sense, Scenes and Prims are virtual (or composed) objects, and all querying of USD data is done via these virtual things. Unlike what is proposed here, you cannot explicitly call a 'compose' method. Composition happens whenever you open or update a layer. You always write to a layer, and read from a composed scene.

USD's strong distinction between the role of composed objects and un-composed objects leads to clarity on what to use when:

Which entities are products / displayable?

I found this discussion a helpful reference: https://www.openusd.org/dev/user_guides/render_user_guide.html

My simple understanding is that geometrical object (GPrims) are first-class concepts in USD (no suprise there!), so the USD library will traverse the scene graph and collect all the GPrims. GPrims can have an unlimited number of transforms applied to them (XFroms, equivalent to Placements in this discussion).

The USD library tranverses the scene graph, collecting all the transformable Prims (UsdGeomXformable) and GPrims. Each GPrims is transformed by all the product of its parent transforms and placed in the scene.

Evaluation of the broken placement tree

It seems that placing something with a constraint relative to a reference object is done in USD by making it a child of the object and adding a transform. The geometry of the constrained object is first transformed as the reference object is, then gets the additional transformation applied.

What makes this so easy in USD is the there is only a placement tree - there is no semantic tree. IFC4's hierarchy uses terminology that suggests a semantic tree, but we sometime place objects relative to objects in the hierarchy that are not semantic ancestors. Some options are:

Can typing still be a component relationship?

I agree that we should avoid using the work Type as much as possible. Type means different things in object-oriented programming vs Revit, and in an ECS we can talk about the type of a component, the type of an entity, etc... it is easily the most confusing word in all these discussions.

USD instead uses the word schema.

A schema is the closest we get to a object-oriented type in USD. A schema defines a set of properties and attributes that must be present for a composed object to be treated as a member of a schema.

So for instance, you can ask the USD library for all the Prims that meet a particular schema definition, and then interact with them as if they were objects that look like the schema. It is a way to achieve this:

where you want an entity to have a set of components so you can reason about it in specific ways during automation

The openBIM equivalent is the Information Delivery Spec. For example, a client can define a set of schemas in their IDS, and then the composed data that they receive from their consultants can be compared against the schemas, even if different parts of the data come from different designers.

(USD also has a Kind, which refers to the role of a Prim in the scene hierarchy. It can be model, group, assembly, component or subcomponent. The kind of a prim is used mostly to filter down the scene hierarchy to make it more accessible to humans. (e.g. don't display the prim hierarchy below components). This doesn't really relate to anything on the IFC side so far.)

3 levels of schema

I agree as well!! I proposed a three level schema a while back, which was a little different. I had collapsed level 2 and 3 into my level 2, and broken the exchange layer into two: a low-level format for components (untyped), and a high-level container and web-native exchange.

Finding the right layers to break things down into is key to creating a clear and flexible system.

USD vs. Entities and components

I like @tomvandig's comments about how there are certain properties of the video game ECS approach that are designed to meet needs that we don't have.

If we compare USD's composed Prims to entities and the (un-composed) PrimSpecs to components, it becomes apparent that USD is an Entity-Component data model, but with slight twist. Rather than collecting components and then creating a composed entity by bundling then, they collect a bunch of partial entities and create a composed entity by overlaying them. (In other words, Entities are defined by a bundle of components, Prims are defined by a stack of PrimSpecs). The end result is the same.

In the pure ECS case, if an acoustic engineer wanted to publish an STC rating for a wall, they would publish a component that references the wall entity ID, and contains an STC rating property. In a USD world, they would publish a layer with an PrimSpec that had the same id as the wall, and define the property on the PrimSpec. In both cases, after composition, the acoustic rating of the wall would resolve to the component or PrimSpec published by the acoustic engineer.

On the other hand, USD has some things that we don't need or don't want:

We would also want a web API (part of OpenCDE?) so that we can compose layers / containers / data sets across different cloud environments, which appears to be beyond USD's scope.

aothms commented 7 months ago

Hi @daviddekoning

I hope you don't mind me jumping into this thread.

Not at all! Very valuable additions I think.

USD's strong distinction between the role of composed objects and un-composed objects leads to clarity

I think this is a really interesting and helpful perspective. I wouldn't be opposed to approaching it like this

But:

Renderable content for USD is anything that is considered imageable. Imageable content is often geometry-based, such as meshes and curves, but can also be content such as lights, volumes, and physics joints. Imageable content also isn’t necessarily content that has a specific position in 3D space. For example, a Scope prim is considered imageable but has no independent position (but can contain a group of other prims that might have positions and bounds).

Thanks for this reference. It's good to see how they approach it.

It seems that placing something with a constraint relative to a reference object is done in USD by making it a child of the object and adding a transform. The geometry of the constrained object is first transformed as the reference object is, then gets the additional transformation applied.

Yes we also discussed this, whether it is sufficient to encode the transform as relative but without an explicit relationship back to what it is relative to. The issue we foresaw with this is partial exchanges. If that link is not explicit, then how to make sure that with a partial exchange you obtain all relevant parent nodes by simple graph traversal? I must say though, that I could live with that. We could embed it as additional intelligence in the partial exchange routines if it simplifies the conceptual model, which I think it does.

USD instead uses the word schema... A schema is the closest we get to a object-oriented type in USD.

Well, proving your point I guess, that's also not my interpretation of the word "type" as I was using it. For me the typing is the mechanism to enable component reuse, not a blueprint or schema. But yes, this just reinforces the idea that we should agree on a clear vocabulary.

I proposed a three level schema a while back, which was a little different

I think that makes a lot of sense, we might end up with four then, or however we want to phrase that:

tomvandig commented 7 months ago

Hi @daviddekoning

Referencing Virtual Entities

Great! This sounds indeed very similar, if not exactly the same. That's a good thing I imagine. Is there any additional benefit of the way USD does this, besides clear naming? Does the layer do something extra or is it just a grouping? I have heard and read a bit about USD but am in no way familiar with it.

Clarity on composition is a good thing for sure, but I feel layer is not really the right word for what we're trying to do. Can we clarify it differently?

Placement

I think its important to distinguish between the data composition and the various hierarchies. At first glance I really like having the entity hierarchy determine the placement as it simplifies everything a lot, but if I understand correctly there are various placement types that do not follow the simple "transform the child with the parent matrix" logic, like Grids and Alignments.

Having just the parent/child relationship determine the placement would then mean that the placement would be altered based on the parent classification or components (e.g. the parent is an alignment or has an alignment component). This feels a bit indirect to me, as both the parent and the child need additional information to construct the right placement (parent needs to supply an alignment curve, child needs to supply a position along the alignment). Same holds for grids. Doing all this implicitly seems less clear than using relationships.

Maybe we can mix it up where placement is derived from the parent/child entities unless some condition is true, but these kinds of edge cases are annoying to specify for us and annoying to discover as an implementer.

To me it seems clearer if placement is a component that can be subclassed to support different types of placement, and has explicit relationships to the things it needs to be fully specified into an absolute placement. But it sure is simpler to do it based on the hierarchy.

Also agree with @aothms point about partial data transfer, with explicit relationships you will at least know that you're missing information.

API

I think a web api makes sense but only to coordinate transport and maybe basic queries. Having a webapi that offers deep queries on the current status of the ECS should be tightly scoped as I think every application of this data will have its own needs for querying and supporting every use case is overwhelming. Rather I would see transport strictly coordinated and querying kept to the minimum. (give me all components of type X, give me all components of entity E, etc).

Maybe the last statements are also part of a higher-level discussion: are we making just an interchange format, or a "distributed database" in the sense of behavior and features.

gschleusner1972 commented 7 months ago

@daviddekoning @tomvandig I Agree with Tom here. This is a data composition approach not a configuration composition which usd is doing, so the graph doesn't address things outside of the entity. IE a single Thing. So placement can be part of it, if it just uses local placement, but when it's relative, you really want to externalize that, and that is why we still have relationships. Layers are just layers of the graph that contain values (think photoshop layers) , I think we are going to end up with something without the need for these. If you place a placement component on the instance then that seems like it would override the inherited placement, same for other components. We intentionally don't want or have a use case where you don't own all the components but you can override them. To support a contractual workflow no one should be able to edit your components which is why usd is powerful for their use case, but really problematic for ours. See this section in the video, they use the hierarchy to set values and create variants... not what we want at all, the components need to be static not variable all the way up. https://youtu.be/4W5D-IuRyaM

aothms commented 7 months ago

I wouldn't necessarily rule a layered approach out so quickly though, because we still don't really know what our components look like. Not requiring explicit relationships akin to the strictly procedural definitions we have now in IFC4x results into some interesting possibilities.

Currently we can only create an extrusion if we have a base profile.

graph LR;

    Column:::E-->Extrusion:::C-->Rectangle;

    classDef E fill:#63c408
    classDef C fill:#08aec4

If we opt for a more layered approach we get a greater degree of decoupling

graph LR;

    Column:::E-->Extrusion:::C
    Column:::E-->Profile:::C-->Rectangle;

    classDef E fill:#63c408
    classDef C fill:#08aec4

This way we can allow e.g the architect to define the depth of the extrusion based on which the engineer says: ok this needs to be an IPE220.

This is a contrived example, and I fully acknowledge the advantages of an explicit dependency graph. But the more combinatorial freedom that comes from a layered approach I don't think should be so easily dismissed.

If we have "Post-composition archetypes" we can still require that on the eventual composed graph an extrusion needs to have a basis.

It also depends of course on whether Rectangle is an entity or a component. If it's an entity, this kind of shared ownership is already realized (again, component to entity relationship) but it still implies a degree of temporal dependency (can you create the extrusion if the basis does not exist yet?). In a more layered approach the dependency is undirected.

gschleusner commented 7 months ago

No real use case would allow two different parties to own half of the column geometry. If you need two you make two sets of components that make a whole column. My extrusion, your extrusion. This is actually necessary to coordinate between parties at transitions between different owners. Its much easier to communicate with "make yours like mine" then it is to have to have meeting. to say move that column 6mm to the left. This happens with Slabs and slab openings, columns, stairs, facades all the time. We need to support the smallest useful components. So we do have a need to say ... if I have two sets of geometry , which is expressed. But you shouldn't get to change someone else. This is not layers, this is opinions in USD.. In our uncase no one would want to do that entity by entity but instead "Sho w the contractors column geometry" is much more likely which is "collection wide" opinions

gschleusner1972 commented 7 months ago

@daviddekoning I think Layer is closer to the Virtual Component than I understood. @tomvandig and I met and realized that the Virtual Entities was really a "explicit" path through the composition graph. Does that exist "above" an instances would be a question as this would make a "stage" like concept