hypar-io / IFC-gen

Industry Foundation Classes (IFC) code generator.
MIT License
133 stars 34 forks source link

Proposal for the management of the IFC specification. #10

Open ikeough opened 7 years ago

ikeough commented 7 years ago

This proposal relates to the ongoing management and development of the IFC specification, and the auto-generation of conforming IFC software libraries and documentation. Because the IFC specification does not currently have a repository where this discussion can take place, I'm putting it here as the IFC-gen project would necessarily need to integrate with such a repo.

The primary assertion of this proposal is that the IFC specification should be managed as code, and that the generation of conforming IFC libraries in multiple programming languages, and IFC documentation, should be automatic and triggered by changes to the specification. The specification should be maintained in EXPRESS as it is the most compact intermediate format and is the one on which the IFC-gen tool relies. As a straw man, I would propose that the IFC EXPRESS specification be stored in an IFC repository on GitHub.

The IFC Specification

The EXPRESS version of the IFC specification should be committed to a repository. The repository's branches would be labelled according to their version of IFC. For example, the IFC4 branch would contain the most recently released version of IFC.

Groups interested in extending IFC could then fork the IFC repository, amend the IFC specification as required for their project, and use IFC-gen on their version of the specification to generate a library in their language of choice. The resulting library could then be used in their service or application. When they are satisfied with their extensions to IFC, they would issue a pull request against the branch on the IFC repo into which they would like to have their contributions merged. The submission would be reviewed, code review comments would be picked up by the submitter, and the code would be merged. During this review process, the community would be able to see and potentially add to the discussion.

Requests for enhancements to IFC could be handled like issues in code. Any user could submit an issue, relating it to a particular release. When that issue is closed, as the result of a change to IFC, the issue would be closed with a note relating it back to the version where the issue has been addressed.

Continuous Integration

Every commit to a branch in the IFC repository would initiate a Continuous Integration System to generate IFC libraries in supported languages (using parsers generated using IFC-gen), and generate documentation to match those libraries.

High Level API

By itself, IFC is not a user-facing API. Tools like the Geometry Gym IFC tools, use a higher-level API that is designed for user-comprehensibility. This would still be supported. In fact, the IFC repository and IFC-gen would provide standard tooling on which these higher-level APIs should be built. Discussion of the standardization of a higher-level API for IFC are outside the scope of this proposal.

Databases

EXPRESS is a data modeling language. As such, it does not, and should not, care about where you will be storing your data. For that reason, a conversation about storage mechanism for IFC (in memory, databases, etc.), and the affect that those storage requirements might have on the IFC object model, are also outside the scope of this proposal.

bigdoods commented 7 years ago

Hi Ian,

Nice proposal!

The benefits of this type of a system has been identified internally prior to this. The main blocker is the amount of work that would be required to replace the existing tooling that they generate the IFC schema documentation (IFCDoc tool) with a system that would support git workflows such as automating the compilation process using CI/CD pipelines.

 they would issue a pull request against the branch on the IFC repo into which they would like to have their contributions merged. The submission would be reviewed, code review comments would be picked up by the submitter, and the code would be merged. During this review process, the community would be able to see and potentially add to the discussion.

The other issue that buildingsmart have is that the IFC standard inherits its processes from ISO. Buildingsmart have a committee in place that essentially facilitates a process prior to 'releasing' any new version of IFC. This is for quality checks and testing. I appreciate your suggestion that we could write tests for all of this stuff but this is a point that would need discussion.

A further suggestion to this proposal that I would like to see is the way that IFC handles its version naming as the ecosystem could benefit from a semantic versioning system for clarity to developers working with tools that are dependent on specific schemas.

theoryshaw commented 7 years ago

Sounds great. I would recommend not asking for top-down permission, but rather innovate at the edges—working with a few 3rd party developers/users at first and slowly working out from there.

I believe 'discussion' is more fruitful relative to actual code and workflows, than trying to address and answer all possible contingencies beforehand.

ikeough commented 7 years ago

@bigdoods Thanks! I realize that there are many conversations which I was not privy to in which the above proposal was discussed. I'm playing catch-up a bit.

I realize that integrating the documentation generation in the CI/CD pipeline might be a good sized task. The fact that this concern has been raised by more than one person (in offline conversations), and now yourself, only makes me more certain that this work should be undertaken. Having a documentation process that is disconnected from the development and build pipeline leads to things getting out of sync, and keeps projects from moving forward rapidly.

With regards to semantic versioning, I agree. I did get one bit of feedback that semantic versioning may not work with regards to BSI and ISO. I would need someone with more knowledge about those requirements to comment.

I would also agree with the comment by @jmirtsch on Twitter that we should include all past and future versions of the specification in the IFC repo. This would keep with the one-branch-per-version strategy I outlined above.

timchipman commented 7 years ago

Hi Ian, saw an email on this and had some thoughts...

I agree that GitHub is probably the best tool for maintaining source code across distributed teams, and also works well for maintaining standards that are fully self-contained.

This got me brainstorming...

IFC in its current form is comprised of a static/early-bound schema (currently represented in EXPRESS and XSD), along with dynamic/late-bound schemas on top in the form of property sets (name/value pairs more or less) and constrained instance graphs (aka "model views" with rules stating things like an IfcBeam should have its shape represented with IfcExtrudedAreaSolid). These dynamic schemas can also be represented in files -- ifcXML for property sets, mvdXML for instance graphs. That said, the combination of these files in separate places may make it somewhat confusing and disjointed if editing these separately. And the nature of these files being in disparate niche formats means that off-the-shelf tools (e.g. Visual Studio) can't keep everything in sync (i.e. automatic refactoring, compiling source code to ensure validity) and drives the need for custom tools to enforce referential integrity. And all of this presents additional learning curves for software developers.

Looking at the issue database at www.buildingsmart-tech.org/jira, it seems most issues don't really impact EXPRESS, but property sets and usage (instance graphs). Thus, the static schema (EXPRESS) itself only captures a small subset of what IFC defines.

So I would contend that to use code as the "master" that drives the IFC specification, such code needs to go much further than what is in EXPRESS. Or else if EXPRESS is the "master", then it would need to be annotated in some way (e.g. some data structuring convention embedded within comments).

So rather than the following that can be derived from EXPRESS today (flattened inheritance for illustration, assume everything public, property getters/setters):.

public abstract class IfcElement { IfcGloballyUniqueId ObjectId; IfcOwnerHistory OwnerHistory; IfcLabel? Name; IfcText? Description; IfcRelAssigns[] HasAssignments; IfcRelNests[] Nests; IfcRelNests[] NestedBy; IfcRelDeclares HasContext; IfcRelAggregates[] IsDecomposedBy; IfcRelAggregates[] Decomposes; IfcRelAssociates[] HasAssociations; IfcLabel? ObjectType; IfcRelDefinesByType[] IsTypedBy; IfcRelDefinesByProperties[] IsDefinedBy; IfcObjectPlacement Placement; IfcProductDefinitionShape Representation; IfcRelAssignsToProduct[] ReferencedBy; IfcIdentifier? Tag; IfcRelFillsElement FillsVoids; IfcRelConnectsElements[] ConnectedTo; IfcRelConnectsElements[] ConnectedFrom; IfcRelInterferesElements[] InterferesElements; IfcRelVoidsElements[] HasOpenings; IfcRelSpaceBoundary[] ProvidesBoundaries; IfcRelContainedInSpatialStructure ContainedInStructure; IfcRelCoversBldgElements[] HasCoverings; }

such code would also include "dynamic" aspects most relevant in usage (defined in property sets and model views) and probably incur the most evolution such as:

public class IfcWall : IfcElement { IfcWallTypeEnum PredefinedType;

/ relationships / IfcWallType Type; / IsTypedBy / IfcMaterialLayerSetUsage Material; / HasAssociations / IfcRelConnectsPathElements ConnectionHead; IfcRelConnectsPathElements ConnectionTail;

/ representations -- captured at Representation / IfcTriangulatedFaceSet Tessellation; IfcIndexedPolyCurve Axis; IfcSolidModel Body;

/ properties -- captured at IsDefinedBy / IfcBoolean LoadBearing; IfcInteger AccousticRating; IfcTimeMeasure FireRating;
IfcThermalTransmittanceMeasure ThermalTransmittance;

/ quantities -- captured at IsDefinedBy / IfcLengthQuantity Length; IfcAreaQuantity GrossSideArea; IfcAreaQuantity NetSideArea; }

and similar for distribution elements having specific ports (e.g. PowerInput, ChilledWaterOutput), properties for performance, etc.

Then custom attributes can be used to annotate conversion/serialization -- e.g. WCF DataMemberAttribute to indicate STEP serialization order and XML naming, Description for attribute documentation (at least for English / default authoring language), Category to indicate property set names, others invented as needed. C# would seem to work well for such purpose as it supports extensible custom attributes, has APIs for accessing all of this programatically, is probably the most widely used by AEC software developers, and can still be leveraged to generate code in other programming languages. That said, for a schema that describes geometric concepts such as IFC, diagrams and detailed descriptions are often needed, which may be better maintained as separate PNG and HTML files. Then for localization, .NET relies on resource files for that, which perhaps may be more suitable than in-line attributes to take advantage of editor support. For EXPRESS rules -- perhaps they could be replaced with C# (maybe automatically, maybe in both directions) and made more powerful to take advantage of the full C# language and libraries of .NET.

With such approach, IFC major versions could be major branches, and what we call "model view definitions" could essentially be sub-branches that tack on specific usage such as shown above. I would think we could fully support the exact same semantics we have today in the exact same formats -- just storing these in code in a more concise and organized form on GitHub. Tools such as IfcDoc could still be leveraged as needed (maybe support GitHub directly instead of just local files), though perhaps used to a lesser extent as much could be done by editing and compiling code directly.

For compatibility, it would seem we'd need to preserve the current generic structure of IFC for some time, though if the industry shifts to using such code base as input, then perhaps a future version of IFC could define data serialization in XML or JSON in a more readable form with direct attributes such as above.

That said, even if we can make code more accessible to a wider user community in this way, that doesn't mean we have to change the process for how major extensions are done (developing use cases, producing real-world models with representative data, comparing alternatives, etc.); there would just be another avenue (and a more direct one than Jira) for incorporating recommendations and changes.

ikeough commented 7 years ago

@timchipman Thanks very much for this "brainstorming."

If I am understanding your proposal above, you're suggesting that the base format for the specification is C# as opposed to EXPRESS. If this understanding is correct then I disagree on the basis that a specification like IFC should be defined in an intermediate language. I can imagine a couple of scenarios in which choosing one language to be the standard could cause big problems. First, you might represent something in C# that would be represented quite differently in another language. The example that comes to mind is IFC's concept of a SELECT. In IFC-dotnet I've represented this using generics as a Select<Choice1,Choice2,Choice3>, but in another language which supports discriminated unions I would use those. This would require developers who write the transpilation logic for Go, as an example, to learn about C# generics. Second, whose to say that C# will be the "most widely used by AEC developers" in a several years time? I love C#, but I wouldn't want to make that bet. As the browser becomes the OS, we might all find ourselves writing Typescript. I still believe that having a neutral data modeling language, and tooling to emit code in other languages that other developers want to use is the right thing to do.

I would turn your proposal around and suggest that we represent everything in EXPRESS. I'm saying this knowing that I don't have a full understanding of why different parts of IFC are represented in EXPRESS, xml, json, etc. Someone with more of an understanding of the history could probably tell me why these different formats exist. But, it seems logical to me that you could represent property sets, at least, and possibly model view definitions using EXPRESS. Than you'd have one intermediate language which would use the same grammar, parsers, etc.

EXPRESS might not be great, but there's functionality in there that IFC has not yet tapped, which could improve current workflows and help in the organization of the spec similar to what you've proposed above. For example, currently there is no distinction in the IFC specification of what "schema" an ENTITY or TYPE belongs to. This makes it hard to correlate with the documentation which organizes things by schema. We could split the spec into schemas corresponding to what's shown in the documentation, and use EXPRESS's include directive where necessary. Then, generated code could go into separate files or namespaces as the target language supports. We could also use a parser to generate the .dot views of model hierarchies. And we could agree upon a comment format which could be parsed to find things like the basic description for an entity, and associated images that should be compiled to the HTML.

I agree with the premise that having different formats for the specifications leads to a sub par developer experience. However, many developers don't use, and will never use Visual Studio. As an example, I've written the entirety of IFC-gen and IFC-dotnet using .NET Core, on a mac, in Visual Studio Code. If we make choices in the organization of the specification assuming the use of a particular IDE, we won't make the right choice :) I chose ANTLR for the parser generator not only because I had some familiarity with it, but because it has grammar support tooling in VS Code, Emacs, Sublime Text, etc. I like to consider the possibility that someone in an underprivileged area of the world, on a old laptop, using free software, could contribute meaningfully to IFC.

I think I need a history lesson about the different parts of the specification and why they're written in different forms. And, I need an argument, if one exists, for why EXPRESS should not or can not be the neutral format for describing IFC.

donghoon commented 7 years ago

Hi Ian, Dennis referred me your effort. I think Tim made a good point on the nature of IFC specification. The specification of IFC schema and MVD, as well as Property Set goes beyond EXPRESS. Some of the specifications are verbal descriptions that are not formally represented even with the extensive use of Rule definitions. mvdXML might help to formally describe the specification for MVD. For high level API, we may need to mvdXML and Concept Templates as a basis for further development, because Concept Templates are the basis for defining MVDs and mvdXML is the formal specification method for both Concept Template and MVD.

donghoon commented 7 years ago

ISO 10303 consists of several parts. Part 11 is EXPRESS modeling language, Part 21 is data format (IFC spf format), Part 28 is XML format ( XSD for model and XML for data( ifcxml format )), Part 22 is SDAI (Standard Data Access Interface) that defines how to make programming language binding, Part 23 is C++ language binding. As Ian pointed out, EXPRESS has some unique features that may not be covered in all programming language. Language specific binding should use some logic in order to map everything in EXPRESS, such as Select Type, Multiple Inheritance (we don't have this in IFC but EXPRESS has). Part 21 clear text encoding has been the de facto data type for a while in the ISO 10303 STEP. Part 28 is finalized relatively recently, and there are some flexibility of form depending on the configuration language (that can fine tune XML data format). JSON is new, an obvious choice in the current environment but there is no formal specification in ISO 10303 side yet. XML and JSON are more software developer friendly as there are abundant tool for XML and JSON. but it is not compact as Part21 format. For IFC and other data schema utilizing ISO 10303 (CIS/2, ISO 10303 APs) has only one schema, Entity data type, Defined data type and etc. are all belong to the schema.

There are user defined entity data type in EXPRESS, which doesn't belong to a schema. The format is like @ENTITYNAME, for example, IFCDOOR is an entity data type belong to IFC schema, and @MYIFCDOOR can be a user defined entity data type. We don't use this in IFC, instead use property set for everything that is not covered in the schema.

ikeough commented 7 years ago

@donghoon I had to go back and read your posts a few times to try and absorb all the information. Thanks!

I suppose that one reason I implemented IFC-gen using EXPRESS is due to its compact representation and legibility over the XSD version of the spec. I actually began with XSD, using the xsd.exe tool available in windows to generate C# class libraries, but it couldn't handle concepts like EXPRESS SELECT in a way that seemed sensible. Although XSD parsing tools exist for a large number of programming language, there is little facility for editing the format of the code that is generated. With a setup like IFC-gen, you just get a parser. How the code is generated is completely up to the library implementor.

timchipman commented 7 years ago

On the history of IFC, I'm a relative newcomer, but from what I can piece together by looking at older versions and talking with others involved earlier on, the evolution of IFC went something like this:

So essentially there's always been a balance with keeping a separation between a stable core schema, and flexible schemas on top that can evolve without breaking the core. With compatibility as the number one feature, the structure of IFC is not as simple as it could or should be. At some point maybe these flexible schemas on top could stabilize such that the "core" could be expanded and frozen. With the STEP format, this hasn't been possible, as compatibility requires attributes to be in a fixed order, which is probably why a lot of the definitions that weren't there originally were defined using objectified relationships (IfcRelDefinesByType, etc.) as a workaround for compatibility instead of using direct attributes. Other formats like XML and JSON using text aren't impacted by that. If a toolkit can emerge that gets used everywhere, and provides automatic upgrade/downgrade between IFC releases (rather than software dealing with STEP files directly), then perhaps compatibility constraints can go away.

On the topic of using EXPRESS or C# or some other language... for working with STEP format or generating code for programming languages based on the core schema, EXPRESS certainly makes sense for that. If the end goal is to put the schema on GitHub in an organized directory structure that reflects IFC comprehensively (core schema + property sets + model views + format configurations), and in a form that can draw in the widest audience of software developers, in my view it may be beneficial to consider more mainstream alternatives. An unscientific comparison suggests that there are at least 1000 times as many software developers familiar with C# compared to EXPRESS (another problem is that there's no good search term): https://trends.google.com/trends/explore?date=all&q=ISO%2010303,%2Fm%2F08745,%2Fm%2F0jgqg,%2Fm%2F07657k,%2Fm%2F05cntt

Though EXPRESS isn't rocket science, there are many who dismiss IFC because of the unfamiliarity or perception that it is "old", or that there's a lack of tools that can support it. Not to suggest that C# is necessarily the best or will be as common 5 years from now, but it is relatively well known, and can be used with free tools on multiple platforms (not necessarily Visual Studio). A challenge with EXPRESS is finding tools to work with it without paying thousands to a few specialized consulting companies. If developers are to make changes to .EXP files on Github, then they will need a compiler somewhere to ensure there are no errors.

From a technical standpoint, C# custom attributes make it possible to capture just about any additional information for data structures that can be used for other formats or programming languages -- inline with respective data definitions, avoiding the need for separate files scattered about, and leveraging a single tool (the compiler) to ensure validity. As far as EXPRESS mappings go, the EXPRESS SELECT construct maps very nicely to "interface" in C# and Java (just that references are formed in the opposite direction); EXPRESS defined types map nicely to "structure" in C#; not so well for Java with heap allocation).

In summary, I'm not sure any of the above matters if the scope is generating boilerplate code in programming languages. Though if the endgame is to update/maintain IFC and to leverage the widest participation on GitHub, then there might be other things to consider doing at the same time to make that better.

ikeough commented 7 years ago

@timchipman Thanks again for helping to fill in the gaps in my history of IFC.

I would be interested to see how a SELECT maps to an interface in C#. My understanding is that a SELECT is used to represent one of several possible types. As such, I'm unsure how a property on a C# class, which is of a SELECT type could be one of several possible interfaces. Certainly one could use dynamic but then you use lose type safety. My understanding of SELECT must be incomplete.

On the topic of using C# as the base language, I would love it if it were as easy as picking the most widely used language in AECO and doing everything in that language, especially if that language is C#. But it comes down to separating the representation of the data model from the implementation of the same. The authors of IFC chose, I believe correctly, to represent the data model in a language built for that purpose, and that purpose only. The argument about IFC feeling "old" is also well taken. I have thought that myself in the past. But moving to today's version of C#, let's say it's 6.0, will mean that the implementation looks "old" when we're all using C# 13.0, and the ISO committee has refused to let us move forward for any one of a billion reasons (provided we're allowed to represent this standard in C# in the first place). If we take the fact that IFC's specification looks "old" as a given, then we won't be put in a place to bet on the language that won't look old in several years. If we conduct a similarly non-scientific analysis about the change in rate of adoption of programming languages we'd probably find, to our horror, that we'll all be developing in javascript in a few years.

I would argue that IFC looking old has to do with the fact that it is old. It takes a long time to agree on changes to the spec, get those changes approved, and for those changes to appear in client libraries. Years perhaps. Improving the first parts of that process are outside the scope of IFC-gen, but making sure that when a change to the spec does land, the client libraries can be generated immediately is a worthwhile goal for this project.

And without belaboring the point about why C# might not be a great choice, I'll say that attributes as a strategy to extend C# to convey the entirety of IFC might end up in a mess. I've been involved in authoring a code base where we used attributes extensively to layer meaning onto classes and properties and it quickly runs wild. If the primary concern is issues like attribute ordering in other serialized representations (xml and son), then we should consider why this is such a big deal. IFC 5 changes the order of attributes. IFC 5 client libraries expect attributes in different order. If backwards compatibility is a requirement for your software, then ship the libraries for reading IFC 4 and IFC 5 with your software, and provide migration logic to upgrade to the newest version. Or, and this is really the solution that everyone seems to want, stop using a format like STEP which encodes constructors, and start using a format which encodes entities.

With regards to engaging developers, I agree that there will be hesitation to jump in at the level of EXPRESS. There should be! Making an IFC implementation in a particular language is obviously important to you. Important enough perhaps to learn enough about its canonical representation and to ponder the many ways in which you could probably implement a SELECT incorrectly (as perhaps maybe I have done already :) ). But I think there's probably a much smaller number of people who will be implementing parser logic than there will be contributing to layers on top of the generated code. As an example, I've suggested to Jon Mirtschin that he implement geometry gym's API on top of the IFC-dotnet library. In that scenario I would be concerned with implementing the C# parser logic and Jon could expect a clean C# strata onto which he could layer his API. Now, if Building Smart decided that they wanted to be in the business of language implementations, then the game changes considerably. If that happened, and they were at all interested in supporting more than one language, I would still expect them to want to be able to generate the implementations from an intermediate form.

As a final remark on the C# class example provided above, I'll give a little peek inside how Autodesk is thinking about defining types for the building industry. With Revit we have a very strictly defined set of Element Categories which have a fixed set of properties which can be extended with Shared Parameters. This has never been flexible enough for our users who just want to create a thing that they call a "Foo" and give it whatever properties they want, then be able to set the visualization style of Foos in the same way that they would be able to for an Element of a built in category. Consistent and logical, but inflexible. On the opposite end of the spectrum you have Flux where you can put a bunch of numbers in the cloud and call them "Foos", then explain to all your team members what a Foo means, and every consuming system is made to represent a Foo in its own way. Flexible but not consistent. Baking properties into types in C# (if I understand that as the proposal) is closer to the former. The industry benefits from something right in between these two ends of the spectrum. I've come to learn that IFC is very close to this, allowing property sets to be bound to Building Elements using relationships. If the problem is simply that there is no industry standard property set for "Walls", then I think that's a battle that could be won in a way other than embedding those properties into types themselves. We need to propose and standardize the property sets which could have a representation as a type in C# for use in a relationship binding in IFC-dotnet, or as a map in javascript that could be used by code in Flux to define the shape of a "Foo". But the generation of either should be automatic and should come from a shared non-language specific representation.