pipauwel / IFCtoRDF

IFCtoRDF is a set of reusable Java components that allows to parse IFC-SPF files and convert them into RDF graphs.
Other
83 stars 31 forks source link

Lossless IFC-SPF -> RDF -> IFC-SPF? #22

Closed fkleedorfer closed 4 years ago

fkleedorfer commented 4 years ago

We are considering to solve an IFC related problem with an existing RDF toolchain. In order for that to work, we would need a lossless conversion of an IFC-SPF to RDF, and back from RDF to IFC-SPF after manipulating the RDF model.

Is such a roundtrip possible with this project or is there some kind of data/structure loss to be expected?

fkleedorfer commented 4 years ago

After having had a peek at the sources, I understand there is no implementation of the RDF to IFC-SPF conversion. The question still is: is there a way to convert the RDF model back into IFC-SPF, and can one expect to end up with the same IFC content?

pipauwel commented 4 years ago

Thank you for this very valuable question. A short reply: no, this code does not convert back to STEP. When having an RDF dataset, there seems to be no real point to return to a step part-21 file. It is therefore a conscious choice to not include code for the backward route. Instead, I would suggest to expose your RDF dataset on a good server and make the RDF data accessible for querying.

That being said, this IFC-to-RDF codebase has been tested two years ago to find out whether round-tripping is possible. This was tested for the buildingSMART test files that were available from the IFC specification. That test was positive (although we had to append IFC line numbers again specifically for that test, in order to ensure that the exact same line numbers were included again). This entire test was reported back to buildingSMART, as they asked for it, after which the ifcOWL ontology was placed online in the buildingSMART servers.

The code for the backward route to STEP is published, I think, in https://github.com/BenzclyZhang/IfcSTEP-to-IfcOWL-converters, but I do not know its current status. I suggest that you ask @BenzclyZhang. This might be a usable alternative for you if you really need to stay in a STEP world.

Is this code then lossless? It does convert all the IFC samples that are now included in the maven test run (https://github.com/pipauwel/IFCtoRDF/tree/master/src/test/resources/convertIFCFileToOutputTTL). So this is the evaluation set. Those include 19 of the more peculiar cases (LISTs of LISTs, Selects with Booleans and Lists and Classes, Arrays, and so on), which were derived from the buildingSMART samples. More test samples are always appreciated. Other than that, I have not had anything missing so far. If you encounter anything missing, please report :)

fkleedorfer commented 4 years ago

Thanks for the explanation! So... There is hope ;-)

I think there is an argument for round-tripping: SHACL. We are working on a way to map different ifc modelling styles onto each other. Doing it with SHACL would mean that we'd even get a validating style definition as a byproduct. The actual mapping would be done via shacl af rules. However, ifc seems to be the preferred import/export format of the BIM tools so we need the conversion back to IFCSPF... Or is ifcowl supported, say, by revit?

I understand the line number issue. It would lead to considerably bigger files I guess, unless you can code them into node names, right?

Another problem is file size by itself: I realize your code reads the complete ifc file into memory before producing rdf. Would a straming verson be theoretically possible?

fkleedorfer commented 4 years ago

Oh and I am actually not necessarily interested in identical round-trip results. It may be a cool feature because you could then show in a diff viewer what has changed, and you'd have just the changes. This is not a must for our use case, though, and if solving it means much bigger files or is a headache to code, it's probably not worth it. The important question is whether the roundtrip model could be made to be isomorphic to the original one. I sense that the answer was a yes to that.

pipauwel commented 4 years ago

@fkleedorfer Apologies for a long reply and potentially too direct one.

The roundtripping should indeed be possible. We enabled it at the very beginning - and have not changed much in the code since then. So I assume that the code is still complete and roundtripping is possible should you want to.

Rather than implementing this return back into the STEP format, I would rather spend my energy in providing appropriate RDF, JSON, and XML exports and imports for those BIM platforms, which would likely also easy your SHACL efforts. This should actually be easier to implement than the import/export of STEP formats that was already done in those BIM platforms (admiration!). If all coders keep reverting to SPF, then there is no real need or incentive in software vendors to ever seriously adopt XML, JSON, or RDF, which I think is a pity, and I would rather not support.

So... from my side, I think I'd rather write an RDF exporter for Revit than making this code produce SPF files.

The line number issue is not that critical in terms of file size, and in many cases, the URI already contains the original line number. File size is also not really that important, as this RDF data should just go into a triple store and it can be queried there. File sizes are close to a non-issue in SQL databases and RDF databases.

There is indeed an impact on memory when loading the entire model and converting it. After conversion, this memory issue disappears again, as we can rely on data stores, and ideally do not load everything in memory. The conversion process can become slow, however, especially when loading a 2 or 3 GB STEP file and only assigning 2GB RAM to the process. That will not work, probably not even when streaming. The IFCtoRDF code does use a Streamwriter for the RDF, so it should actually already be there. I can imagine, however, that this can be improved. Reading the SPF content, unfortunately, requires loading everything in memory and reconstructing the full model, which will remain a burden on memory. So... fully streaming this conversion (not loading the IFC file fully in memory), it might theoretically be possible, but I doubt that the result will be complete.

pipauwel commented 4 years ago

This is by the way great input on many things that have been on my mind so far, and any precise changes to the code that do not imply a full refactoring, are very appreciated.

It might be an option to set up a branch or fork that also offers the reverse direction, but then I think that @Benzclyzhang would be a good person to talk to first - as he already did this.

fkleedorfer commented 4 years ago

Many thanks for your explanations and suggestions! I understand you recommend against it, but I am still glad it might work.

ad streaming: If you don't mind me asking: why does the whole model need to be loaded before triples can be written? I haven't tried to understand the whole RDFWriter, yet - but looking at the SPF file structure it looks quite possible to me. If this becomes a possible roadblock for us, we may be tempted to invest some time there. Why should we not?

ad architecture: Our use case is transforming models. A BIM tool exports, we transform, it loads the transformed version. This is done locally on the BIM-user's machine. I would like to avoid RDF stores in this design. Rather, we'd transform the export (looks like it's going to be IFC-SPF) to RDF, and either keep it in memory or write it to an HDT file, and use that file for further processing.

pipauwel commented 4 years ago

Maybe the input IFC model does not need to be loaded in memory, but that would be a pretty strong redefinition of the code. Loading it in memory is something that happens in this code in order to first make all links between objects (line number references in the input files), to then also be able to check for inverse relations. Maybe that can be done in a complete streaming version (not in memory); that would need to be tested.

At the moment, the code also loads the corresponding IFC OWL ontology, to be able to generate RDF instances in Jena. This also is a load on memory. Jena and OWL ontologies could maybe kept out of the loop, but it is more difficult to check the end result in that case (full in-house code) - which I did not try here in any case.

fkleedorfer commented 4 years ago

Many thanks for your explanations @pipauwel - We were thinking about adapting your code to read IFC in a streaming fashion and write to an HDT model, but we decided not to pursue this route further for the time being.

Currently, we are unsure if all the programs that produce and consume IFC output do so with sufficient quality (i.e. correctness). If the IFC is not correct (but can be consumed again by the program that produced it), the conversion to RDF might lose some data and then the backconversion may not be usable any more. Currently we feel we cannot risk this to happen, and we think we can limit our IFC transformations to simple text replacements, which would be less intrusive, risky and much faster. However, we may get back to this idea or some version of it at a later point in time.