Extract metadata from mo files

anandkp92 commented 3 years ago

Develop software to extract Brick representations from Modelica. Build on the work presented here by:

use annotations within Modelica model components to embed Brick information Buildings.Controls.OBC.CDL.Interfaces.BooleanOutput ySupFan "Supply fan on status" annotation (__Brick__(class=Fan_Status), Placement(transformation(extent={{140,50},{180,90}})
extract Brick:Point information from CDL inputs and outputs

anandkp92 commented 2 years ago

Adding the major takeaways from recent discussions on this topic.

There are two main approaches to extracting metadata from Modelica models

Inference of semantic information (as was done in the Shepherding paper). This work would involve building upon the work done in that paper.
Addition of semantic data annotations to the Modelica file (be it a energy model or just CDL sequences or both).

Diving into the second approach:

One suggestion is to use MIME tags annotation(__cdl(type="application/type", content="content"))
- This is useful because the type can be used to denote if the standard the content is based on
- I couldn't find anything exactly that we could use here, but we could use a text/strings or application/rdf+xml or make one up?
- The content could be information about the particular Modelica element/class (such as type or relations. More on that later)
- One issue I see with this is I'm not sure why we need a __cdl type in the annotation - it does not provide any new information. Instead we could either have a generic __metadata annotation or specific __brick or __ashrae223p or __haystack or a combination of these
Where do we add these annotations?
- This could be one annotation per element in the Modelica class that would benefit from having metadata information (such as a supply air fan or a VAV box)
- We could also add these annotations to the connect statements so that we can infer relationships between entities as well.
- One issue with this is that sometimes we might be connecting on element to another that might not have an equivalent representation in the metadata standard we are working on exporting. For example, in this example, few temperature sensors are connected to a multiplex block which in turn is connected to a TRooAir output of a control block. However, multiplex has no equivalent in brick and we have to traverse two connect statements to see that the temperature values are outputs of a control sequence.
What do we keep inside the annotation content?
- it could refer to a rdf:type or equivalent tag
- it could contain the subgraph of a RDF graph (BRICK/223p) containing all triples that contain this element.
- it could be a key value pair, where the keys are the relationships the element has to the value. For example content={type=Brick:Supply_Fan, hasPoint=Brick:Supply_Air_Flow_Setpoint, isPartOf=ahu1}

Some additional questions/comments that came up during the discussion:

If a .mo file only contains control sequences as CDL, what is the provision for adding the energy model and/or the associated metadata? When would one want this and how do we combine metadata models if the CDL model and energy models are two separate Modelica files?
What are the use-cases for this Modelica->Brick or Modelica->223p or Modelica->Haystack translation? The contents within each annotation could vary based on this.
- Connecting models to real building automation systems
- This requires extracting the inputs/outputs from the model and connect them to the corresponding BACnet points. How does this mapping happen? Would we need metadata extraction from both Modelica and BACnet? Why can't we just embed BACnet point names to Modelica
- Developing portable applications for buildings
- If you look at the Mortar platform, each application has a SPARQL query that it requires. In this case, we would need to extract both equipment and inputs/outputs metadata from Modelica model.
How to deal with there are conflicting metadata information from different sources? The different sources could be two annotation tags within one model, metadata extract using method 1 v/s method 2 etc.?
- Using the annotation tags as the ground truth could be an approach we take.

Tagging: @mwetter @gtfierro @marcopritoni @JayHuLBL for comments and thoughts.

gtfierro commented 2 years ago

I couldn't find anything exactly that we could use here, but we could use a text/strings or application/rdf+xml or make one up?

I like the __brick or __haystack style better. Then the MIME type can be text/turtle or whatever.

One question is whether the role of these annotations is to (a) assist in the production of a metadata model from the Modelica/CDL, or to (b) point to the corresponding entities in an existing semantic metadata model that is distributed along with the Modelica/CDL model. This could even be the full Turtle file in some annotation somewhere in the Modelica code. Then, the annotation just needs to indicate which entity it corresponds to. This sidesteps most of the design issues above.

The benefit of the first use-case is that one can distribute self-contained RDF graph descriptions for common, reusable Modelica components. Building the RDF graph from the Modelica model could be as "simple" as stitching those RDF subgraphs together using interpretations of the connect statements between them

mwetter commented 2 years ago

Most of the questions are best to discuss briefly. Here are some short answers. We certainly need use case (a) but (b) is also of interest and likely relevant. If we can handle (a), maybe (b) is a special case where we reconcile the two semantic models similar than in the Shepherding paper.

The tags should be on the block level (and outside of just CDL can also be on the model level) and on the connector level. But not on the connect statements as it can be inferred what connectors are connected by a connect statement.

Each class and each instance can only have one annotation. So there cannot be two annotation tags within one model. But a model (block) can extend another model (block) in which case the higher-level annotation should take precedence if present.

I agree that __brick (or __haystack or _semanticData or whatever we end up calling it) should not be inside the __cdl annotation as a fan model can also have such an annotation but is outside the scope of CDL. (This is a change to the current specification that we have for CDL which we will need to update as we converge.)

I don't think there is a case where we have to "combine metadata models if the CDL model and energy models are two separate Modelica files": We either have only a CDL model, or this CDL model is part of a model that also contains the HVAC system. Some of the HVAC system may be in different files but there is always a top-level .mo file, and that is the file that should be parsed.

anandkp92 commented 2 years ago

Summarizing the recent discussions here.

Use cases

Generating control sequences for real BAS controllers using CDL
FDD and other open loop verification
- only need input and output points
- only CDL sequences suffice
- add annotation tags to CDL blocks (elementary CDL block and composite block) --- higher priority
- May not need to parse the entire modelica elements. Stop at blocks?
- User specificies which block or blocks to parse (user input)
- Output:
- semantic model with inputs and output
- eventually these will need to be tied to BACnet points.
- Questions yet to be answered:
- who does the BACnet point mapping?
- How to specify that some points will be need to undergo transformation before mapping to CDL input/output
- Do we allow annotations to propogate to lower blocks? Current CDL has a mechanism to propogate annotations.
Developing portable analytics applications
Developing portable advanced control sequences
- Each algorithm would have a BRICK or SPARQL query.
- Need Energy model and CDL sequences
- Output:
- semantic model with equipment and points (inputs and outputs)
- Questions:
- When should we stop parsing? Currently, I stop as soon as the element type does not start with Buildings.*
BOPTEST/simulation/Hardware in the loop
- needs energy model
- might need external references to BACnet points or other real world variables
- Output:
- semantic model with equipment and points (inputs and outputs)

Use cases 1 and 2

We will work on adding semantic information to the input and output variables of CDL blocks within the library of control sequences defined within OBC
This will provide input to the controls vendor how to attach what type of BACnet points to what CDL inputs and outputs

Currently we are looking at using annotation syntax like below:

Buildings.Controls.OBC.CDL.Interfaces.RealInput TZon(unit="K", displayUnit="degC")
"Zone temperature"
annotation (Placement(transformation(extent={{-140,-30},{-100,10}}),
    iconTransformation(extent={{-140,-40},{-100,0}})),
    __brick(mimetype="text/turtle", content=":TZon rdf:type Brick:Zone_Air_Temperature_Sensor"));

The content could include just the type or a whole graph with the point as the subject or a shacl shape to restrict what type of points can be connected. We are still working on testing each of these.
If the user specifies more than one control block from which the metadata should be extracted, we should be parsing through the components of the CDL block (be it elementary or composite) to infer relationships of how certain equipment are connected even if there is no energy model present.
Possible issues:
- what happens when there are no corresponding types in Brick/Haystack/223p?
- How do you detect what blocks are from the CDL library or if they are valid?

anandkp92 commented 1 year ago

Closing with 2 updates:

specification for semantic annotations: https://obc.lbl.gov/specification/cdl.html#semantic-information
modelica-json now supports semantic export from these annotations: https://github.com/lbl-srg/modelica-json

lbl-srg / obc

Extract metadata from mo files #94

Use cases

Use cases 1 and 2