On component->entity edges

While IFC ECS is not inhibited by all problems we currently observe in IFC SPF, we should not fall for the same mistakes as we did before. Below is a reflection on lessons to be learnt when designing IFC ECS.

dense graph with unpredictable in-edges

The problem today: While the placement/decomposition/containment product tree in IFC behaves fairly well, underneath these products there is a dense unpredictable graph. An object- or resource level API is virtually impossible, because the implications and intent behind operation are impossible to oversee both by software as well as by end users.

graph TD;

representation1[representation] --> extrusion --> polyline --> point

placement --> point

representation2[representation] --> extrusion

component->entity edges creates a natural demarcation between identifiable objects (entities) and the data that constitutes them (components). When data is shared by multiple entities it should be moved into a typical entity that is the "owner" of these components. Offering an editing API on top of this structure is fully explicit and unambiguous on the intent, because there are no unpredictable additional in-edges.

instability of surrogate identifiers

The problem today: The globalid attribute that identifies all rooted elements in IFC is currently not optional. Over time there have been many requests from vendors to make this attribute optional. Because many of such elements are in fact transitive content in exporting applications, such as objectified relationships, but also in some cases for example building element layers. What is transitive and what not depends depends on the internal domain model.

Vocabulary:

surrogate key: unique identifier generated to ensure its distinctiveness of a record, without an inherent meaning or significance beyond their role as identifiers

natural key: identifier derived from the inherent characteristics or attributes of the data being stored

Surrogate keys are likely required for being to instantiate component->component edges. But increasing the coverage of globally unique identifiers (to everything being an component would mean most of the geometric domain) is going to exacerbate this issue of being unable to retain them as stable identifiers. The inability to retain stable surrogate keys render all incoming edges invalid.

runtime variability of union types

The problem today: A lack of clear choices has resulted in a schema with an excessive amount of union types (SELECT in express). While there is inherently nothing wrong with union types (many functional programming enthusiasts advocate them over the use of inheritance for example), they are a means to precisely stipulate the domain of an operation, using them as a means to introduce runtime flexibility has far stretching consequences.

Allowing both component->component as well as component->entity relationship requires:

on a schema level, creating two constructs for a link, one c->c, one c->e
and either:
- on a case by case basis review which component field is c->c and which is c->e
- allow for a union[c->c, c->e]

Allowing runtime variability on such a fundamental level has far stretching consequences due to its combinatorial nature. But statically determining for every field whether it's c->c or c->e is likely a completely arbitrary exercise.

coordination by means of disjoint graphs

The problem today: Authoring tools modelling the same structure are creating their own disjoint graphs with redundancies to represent the structure. These graphs are overlain based on logical connections or geometric proximity in coordination tools.

Allowing component->entity links is essential for building collaborative open-ended networks, because one can relate to data that is being enriched by others. If only component->component is supported what we will see in practice is likely disjoint graphs, like now.

neutrality of the spec and the long tail of bespoke solutions

The massive accomplishment that has been facilitated by IFC is the long tail of bespoke solutions that operate on this data and facilitates downstream processes. We should not only build for the handful of monopolists, but also build for the long tail of downstream usage scenarios and domains that are tangentially related to construction. This long tail is not "all-in" on IFC or ECS. IFC5 needs to remain a viable option to interface with in an import-export workflow. We cannot build artificial barriers to IFC5 adoption and require every application to adopt an ECS data sharing workflow. IFC5 needs to be radically simpler to come to a truly collaborative multi-disciplinary industry.

I have socialized this topic with ODA, Autodesk, Nemechek and others and the response is that there isn't an interest or need to break down geometry this way. IE. Inside a "Geometry" component you'd find all the geometry for a whole thing and that would include multiple parts. Most likely in STEP as a starting point. This is how I've been socializing it from the beginning so that tracks that this is a payload and not expressed in components this way.

There would be a use case of the geometry only being a Profile, but that still would be the "geometry". If you want to reuse the geometry you would just make the geometry component part of a typical entity. Then it follows that component to component relationships don't make for a unpredictable edge as you'd always point to the geometry and get the whole thing. @aothms

This image thus should be taken literally base on this viewpoint. The "geometry" is the full geometry of each chair and their are multiple geometry components of the full geometry.

They pointed out that other reasons to do this would be so when you make a relationship with materials that, lets say apply to individual faces you would reference them directly ... so component123.facexyz (sudo syntax) and thus breaking them into parts would make make this very difficult and make us have to reinvent these techniques per format of #geoemtry. Finally this maps directly to being able to insert other representations not made from STEP. OBJ, GLTF, USD etc and so it fits the pattern of geometry is self contained in a component and not broken down further.

If geometry is an opaque blob we end up duplicating all the relevant data in redundant external definitions. In the end people need to know the profile of the sweep. Why not uplift it to a component?

there isn't an interest or need to break down geometry this way

This was only 1 of the 5 points though

I think most are thinking the data would actually be contained in the component or a single URI. In most storage platforms they would be combined. I specially asked if people want to reuse sweeps in the Windows example and the answer was no. They would just reuse the blob as a typical in both windows.

IMO people want a blob because there is no real use case that anyone can find/ point to where you want to query just a polyline of an extrusion, if they need the geometry they need it all so why break it apart. The profile example most likely would be used if we allow matrix transformations to instantiate geometry. IE the beam profile assumes a default direction and length and is transformed into a beam, column etc. of a given length. That would mean for those elements you don't even carry around extra geometry.

If geometry is an opaque blob we end up duplicating all the relevant data in redundant external definitions. In the end people need to know the profile of the sweep. Why not uplift it to a component?

Yes, this would indeed use surrogate keys. With Geometry as a single component per representation then this seems like an no more issue then then any other data which would also need c->c edges. It would be trivial to store the surrogate in the authoring tool after its generated as many tools do today , but the current proposed schema also makes provisions to store the authoring tool ids for this object which is more robust then is current possible today but has become standard practice for tools that do this.

Surrogate keys are likely required for being to instantiate component->component edges. But increasing the coverage of globally unique identifiers (to everything being an component would mean most of the geometric domain) is going to exacerbate this issue of being unable to retain them as stable identifiers. The inability to retain stable surrogate keys render all incoming edges invalid.

tomvandig / eccg