Data structure for connections is inefficient for simple cases #161

Closed spine-o-bot closed 1 year ago

In GitLab by @DillonJ on Apr 10, 2020, 06:15

Having been working with a load flow model with c. 550 connections, I think we can do a lot better.

To define a simple connection with a capacity I need to define the following:

connectionfrom_node(forward) connection__to_node(forward) connectionfrom_node(reverse) connection__to_node(reverse) connectionnodenode(forward) connectionnodenode(reverse) connection_capacity(forward) connection_capacity(reverse) fix_ratio_out_in(forward) fix_ratio_out_in(reverse)

This is not ideal for many reasons:

There is a lot of redundant information
Creating the data is more onerous than it needs to be
There is more scope for defining data and relationships inconsistently
It makes diagnosing topography issues way more difficult
It renders the graph view effectively useless

For a simple connection with a capacity, the following information is all that is necessary to specify:

connection__from_node
connection__to_node
connection_capacity

It would be much nicer to see just this :

Of course we want to retain the flexibility to define many different types of connections for different commodities and that is why we did it the way way we did, so I propose we retain the data structure that the model sees and perhaps do some pre-processing of the data, but there are other ways to do it and I am open to suggestion.

The following is a proposal for how we might make it easier for a user in the simple case.

The minimum data required for a connection are:

connection__from_node
connection__to_node
connection_capacity

This is how we could pre-process the data and infer the data and relationships necessary for the model

connection_capacity specified for the connection object infers a bi-directional line and a connectionnodenode relationship between all from_nodes and to_nodes with a fix_ratio_out_in of 1.
connection_capacity specified for the connection object constrains all 4 flow variables (we could implement a new constraint for this)

You can override any of this by defining the individual data items explicitly - the idea is to make the simple case more straight forward for the user.

Thoughts?

In GitLab by @DillonJ on Apr 10, 2020, 06:16

changed the description

In GitLab by @DillonJ on Apr 10, 2020, 06:19

changed the description

In GitLab by @DillonJ on Apr 10, 2020, 06:21

changed the description

In GitLab by @DillonJ on Apr 15, 2020, 07:27

I think at a minimum we should keep the existing structure and use preprocess_data_structure.jl to allow for simple bi-directional connections with just a to and from node and a capacity, and/or a capacity in each direction. We can then add the missing parameters and relationships using preprocess_data_structure.jl. This still allows us to implement more exotic cases when the need arises.

However, I was thinking further about his and several things have occurred to me :

We only need connection__node__node in the case where there are multiple to/from nodes
In the case where there are multiple to/from nodes, there is nothing that enforces balance - i.e. what goes in must come out. The ratio constraint can only relate two flows together at a time. The idea is really to implement losses, but this wouldn't really be easily possible using the ratio constraints in the case of multiple to/from nodes
I think we either have to assume that a connection has a single to and from node, or we need a balance constraint, in which case our connection is now acting like a node.
A connection with multiple to/from nodes can always be effectively created using a combination node.
If we assume a single to and from node, then we don't need connectionnodenode? You can define directional parameters on the connection__from_node and connection__to_node relationships.

It would clean up the structure a lot given that I believe we can do with two relationships what we are currently doing with six (2 x connectionfrom_node, 2 x connection__to_node, 2 x connectionnode__node).

Thoughts @mihlema @Tasqu @jkiviluo @manuelma @nnhjy ?

In GitLab by @Tasqu on Apr 15, 2020, 11:07

I'm not sure how your shortcut distinguishes between a directional and a bi-directional connection. If I want to create a single restricted directional connection, I currently make a single connection__from_node and a single connection__to_node, and give it a connection_capacity, but in your example this would automatically create a bi-directional connection instead? Would this be controlled through a method parameter or something?

Personally, I'm still not a fan of having both units and connections so similar from the model's point of view. If it were completely up to me (and there had to be a connection type entitiy in the first place), I would have it be a direct transfer variable between two nodes, e.g. v_trans(connection, from_node, to_node). This way, directional connections would be defined using a single connection__from_node__to_node relationship, and bi-directional connections using two in the opposite directions.

I realize that the above would again be a breaking change to the core structure, and only included it here as "wishful thinking".

In GitLab by @DillonJ on Apr 15, 2020, 12:01

@Tasqu

I'm not sure how your shortcut distinguishes between a directional and a bi-directional connection.

I wasn't really suggesting a complete proposal, just raising the issue for discussion along with some ideas.

If it were completely up to me I would have connectionfrom_node and connection__to_node and a parameter (perhaps "directionality"). This way you avoid redundant information and avoids the situation where the reverse connectionnode__node could be defined insonsistently.

But I do agree with the principle of having a single flow variable, or at most two rather than the 4 we have now. In the vast majority of cases, we just have a single flow. We could still account for losses with a single free variable (or two positive variables) with terms in the node balance equation.

But perhaps there are generic Energy Systems cases that we're not aware of driving the current structure?

All that said, I think it's not ideal that I have to define 6 relationships. I would take two of any kind over that!

In GitLab by @Tasqu on Apr 15, 2020, 12:23

But perhaps there are generic Energy Systems cases that we're not aware of driving the current structure?

Could be, but I'm still not a huge fan.

In any case, I agree with the sentiment of implementing shortcuts for these types of common simple structures. However, I feel we should make such shortcuts really explicit for the user, e.g. have a dedicated parameter (like bi-directional_simple_connection_capacity or something) that triggers the aforementioned pre-processing. Mainty to try and avoid any potential confusion when any shortcuts are being applied.

In GitLab by @DillonJ on Apr 15, 2020, 13:12

Me neither. And by the way, an aspect of what I'm proposing, or at least an option, is using preprocess_data_structure.jl to create some of the redundant relationships that are inferred from the base ones (however you define them), based on parameter values (could be method parameters).

In GitLab by @mihlema on Apr 15, 2020, 13:54

connection__from_node(forward) connection__to_node(forward) connection__from_node(reverse) connection__to_node(reverse) connection__node__node(forward) connection__node__node(reverse) connection_capacity(forward) connection_capacity(reverse) fix_ratio_out_in(forward) fix_ratio_out_in(reverse) I don't think you need to define the capacity for your case twice, as you already have the fix_.. taking care of that but that's just a side note.

Regarding inconsistent relationships: Some sanity check would be nice, or as proposed an automated creation of the reverse flow, if exactly the parameters for both directions apply. A simple connection__node__node does at first glance seem nice. I think it could be done with pre-processing.

It seems to go back to an early issue #4 . I do agree that it is cumbersome at the moment to create these, but we do have a sort of stable version of SpineModel at the moment which is (more or less) working. Thus, I'd rather not open a discussion again on how to define connections at this stage. Using methods more consistently is a good way forward. I think we need to have a wiki or the like to align on naming etc.. Even though I'm very happy about the pre-processing, I am a bit worried that we are overloading it at this stage of the model development and we end up with quite a few patches.

In GitLab by @DillonJ on Apr 15, 2020, 14:41

@mihlema We could do it in a backward compatible way. If we introduce a new method parameter (connection_type for example). Then connection_type=connection_type_simple_bidirectional (or whatever) that would trigger some code in preprocess_data_structure.jl that would add the additional expected relationships.

So from the model point of view, it wouldn't be seeing anything different and databases with connections using the explicit relationships would continue to work

In GitLab by @DillonJ on Apr 15, 2020, 14:44

changed title from Data structure for connections is in{- -}efficient for simple cases to Data structure for connections is inefficient for simple cases

In GitLab by @jkiviluo on Apr 16, 2020, 08:39

A more simple graph view would be this:

I guess that's my part of wishful thinking, since I feel that the 'connection' entity can be expressed by the 'node__node' entity and therefore redundant.

That of course wouldn't address the problem of multiple variables required for implementing losses in a two-way connection where only one direction can be active at a time - which I believe is typical for energy transfers. Using method would be the best way in my view. You have only one connection object where the method triggers necessary variables (or maybe it's the connection__node__node where the method resides).

And if you can do it in backward manner, then that's probably good - however, we should also carrying lot of baggage from the past - it can become heavy to maintain. In that sense, I would probably prefer a migration script to update old data into new format.

In GitLab by @Tasqu on Apr 16, 2020, 08:50

Well, sometimes one might have need to define multiple connections between the same two nodes with different parameters. If I remember correctly, that was an issue with Backbone at one point.

In GitLab by @jkiviluo on Apr 16, 2020, 08:51

Yes, but in Spine Toolbox you can have multiple node__node entities with different names.

In GitLab by @DillonJ on Apr 16, 2020, 09:40

Actually, we assume for a single relationship class, you can't define the same relationship twice, even with different names - uniqueness of the members is assumed.

In GitLab by @manuelma on Apr 16, 2020, 10:33

@DillonJ is right, we can't have two node__node relationships between the same two nodes at the moment. If we release this restriction we need to rethink at least the tree view and the parameter convenience function syntax.

In the tabular view, relationships are fully identified by their member objects, as we never present the relationship name. The same goes for the relationship parameter value pane.
In SpineInterface.jl we generate parameter convenience functions that are called by specifying classes and objects. For instance, say we have a parameter called connection_capacity attached to node__node; we'd write connection__capacity(node1=..., node2=...). So if we had two node__node relationships among the same nodes, how would the convenience function know which one the caller is referring to?

In GitLab by @jkiviluo on Apr 16, 2020, 11:15

Right, of course. I suppose there is no appetite to fix those issues right now. The downside is that we need more entities when we don't allow multiple relationships between two entities. Of course it wouldn't be trivial to handle the convenience function syntax issue. Although, I wouldn't be 100% sure it needs to be resolved. Is there any case where you would need just one but not the other? In connection specific constraints both connections would get their own set of constraints and in a node balance flows from both connections would be summed anyway. Spine Toolbox could of course block one connection when needed. E.g. two scenarios: one with AC line and another with DC line. If these were parallel lines (and thus not alternatives), Spine Model could still distinguish between them since they would have different methods.

In GitLab by @nnhjy on Apr 16, 2020, 15:17

Juha's wishful thinking implies that the connection we want is exactly a relationship between two nodes rather than an object. But since now it seems impractical to loosen the uniqueness of relationship, we should probably stay with the connection being an object. In that sense, Jody's proposal of preprocessing with some method parameter sounds a nice starting point.

Another thing I'm not clear about the present connection structure is whether we assume that one connection links two nodes only. A connection object can technically link to multiple nodes. What would SpineModel do in that case? A connection linking to multiple nodes may be useful in modelling gas/heat networks.

If a connection is exclusively for two nodes, then we might simply use connection__node__node with a directional parameter (consisting of values bidirectional, node1_to_node2 and node2_to_node1) to contain all needed parameters. Directions are linked to the directional parameter value in the pre-processing phase.

In GitLab by @DillonJ on Apr 20, 2020, 12:22

Ok, so let's see... basically we have three ways (and variants thereof) to do this:

connection_node_node + reciprocal relationship for bi-directional flow
- pros
  - all data specified in two relationships (in most cases)
  - connections are distinct objects, with visibility at the same level as other physical objects like units and nodes etc. (consistency)
  - direction sensitive parameters can be defined on the relevant directional relationship
- cons
  - requiring a reciprocal connection introduces the risk of inconsistent relationships
  - places a data burden on a user for a very common case
  - three dimensional relationship that can be expressed using two 2-dimensional relationships
  - no explicit handle on to_node and from node
connection_node_node + parameter for bi-directional flow
- pros
  - connections are distinct objects, with visibility at the same level as other physical objects like units and nodes etc. (consistency)
  - all data specified in a single relationship
  - no risk of inconsistent data
- cons
  - three dimensional relationship that can be expressed using two 2-dimensional relationships (or arguably, this is a pro?)
  - direction sensitive parameters need to be named differently
  - no explicit handle on to_node and from node

connection__from_node + connection__to_node
- pros
  - connections are distinct objects, with visibility at the same level as other physical objects like units and nodes etc. (consistency)
  - all parameters can be defined on a per direction basis
  - no risk of inconsistent data
- cons
  - need at least two relationships
- node__node + reciprocal relationship for bi-directional flow
- Pros
  - more data efficient, fewer data objects to create
  - two-dimensional relationship = reduced dimensionality and explicit handle on to_node and from node
- Cons
  - no entity at the object class level corresponding to the physical connection which is an inconsistency with units and nodes (reduced visibility)
  - need to define a reciprocal relationship in most cases = risk of inconsistent data and multiple objects for the same

My preferences is connection__from_node + connection__to_node mainly because :

Explicit object class giving a connection visibility on the same level as units and nodes
ability to create direction parameters without additional relationships
no requirement to specify the same data multiple times (no risk of inconsistent data)
solution with lowest possible dimensionality

In GitLab by @jkiviluo on Apr 22, 2020, 11:30

It seems there is going be a compromise one way or another.

I guess if using connection__from_node, then the connection object would hold any parameters that are not directional and connection__from_node and connection__to_node would hold the directional data. This might not always be clear to the user. However, if our GUI is intuitive in the way it shows such data, then most of the potential confusion should be possible to avoid. But this might require additional information on the class level - when user chooses a bunch of connections, the UI should show data for the connections and for relevant child objects in intuitive order - also in the tabular view. Connection can also be part of a group or it can be an alternative to other definition of the same connection (e.g. alternative investment choices for a transmission line). The UI should be able to handle these cases without mixing things up for the user.

As a result, I think it might be easier if something like a transmission line would have one entity that holds all the data. For example, a transmission line has 1000 MW in one direction and 1200 MW in another direction (NTC, technical issues). But it can also be expressed with physical data (impedance, etc.) and in this latter case the model would optimize how much flow it allows in each direction. If there is just one entity, then all the data would be automatically shown when that entity is chosen in any of the views. Of course this means that there needs to be more parameters with names like capacity_left_to_right or capacity_bidirectional. But many parameters are never direction dependent anyway.

Another matter is what the Spine Model sees. In there, I would think it's best if there is only one parameter that defines NTC capacity in each direction (often the same number). This would keep the model code more simple, but then data needs to be translated from the specific format to the model format. I suppose that's the way it currently is, using direction dimension to establish a constraint for both directions.

In GitLab by @jkiviluo on Apr 22, 2020, 11:30

I find from_node and to_node confusing. My brain refuses to understand that there could be flows going into the from_node. I would rather see something like node_left and node_right.

In GitLab by @jkiviluo on Apr 22, 2020, 11:32

Even if we would have node__node relationship as the connection entity, the class could still be named connection. Our GUI still holds a very clear distinction between objects and relationships, but I feel it should be more versatile for the user. You should be able to mix them when that makes sense and see them separate when that makes sense.

Actually, I feel that the bigger distinction we might need is between data holding entitites (like unit, node, connection) and relationships that establish model relevant functionality (e.g. saying that these units belong to this unitgroup).

In GitLab by @DillonJ on Apr 22, 2020, 13:20

I strongly prefer the consistency around having energy system entities at object level and being able to see an object structure that represents an energy system.

Think of investments for example. Units and connections are physical entities that you would invest in and which would have an investment variable associated with them.

Relationships for me represent information about the structure of the energy system (as opposed to it's topology).

I strongly feel we should see connections at object level.

My strong preference is connection__to/from_node as I feel it is the best from a utility point of view. If I had to choose between the connection_node_node model or the node_nodemodel, I would chose connection_node_node. If you liked, you could call the connection__node__node relationship class, node__node :-)

In GitLab by @mihlema on Apr 22, 2020, 14:15

Thanks for the overview @Jody. I have some clarification questions:

connection__from_node + connection__to_node

pros

connections are distinct objects, with visibility at the same level as other physical objects like units and nodes etc. (consistency)

all parameters can be defined on a per direction basis

no risk of inconsistent data

cons

need at least two relationships This would mean that 1 connection__from_node + 1 connection__to_node would be used for one but also bi-directional transmissions? I agree with @Juha comment that the connection__to_node with an associated parameter directional_loss (or which ever name) might give the impression of a onedirectional flow. – but maybe that is a question of naming. In the case above, a connection can only transfer between 2 nodes as I understand? I think we would need a connection__node__node relationship, too, to allow multiple commodities travel across the same transmission line. Consider for instance an Electricity line for which both Electricity and Reserves are allowed. I think Current versus Alternative No.3:

		Current	Alternative No. 3
objects:	connection	Conn	Conn_El Conn_FRR Connection_Group
	nodes	BE_El BE_FRR FR_El FR_FRR Node_Group_BE Node_Group_FR	BE_El BE_FRR FR_El FR_FRR Node_Group_BE Node_Group_FR
relationships:	connection__from_node	ConnBE_El Conn__ BE_FRR Conn FR_El Conn__ FR_FRR	Conn__BE_El ConnBE_FRR Connection_group Node_Group_BE
	connection__to_node	ConnBE_El Conn__ BE_FRR Conn FR_El Conn__ FR_FRR Conn__ Node_Group_BE	Conn__FR_El Conn FR_FRRE Connection_group Node_Group_FR
	connectionnodenode	ConnBE_El__FR_El Conn BE_FRRFR_FRR Conn__ FR_ElBE_El Conn__FR_FRR__BE_FRR
	node__group_node	Node_Group_BEBE_El Node_Group_BE__BE_FRR Node_Group_FRFR_El Node_Group_FR__FR_FRR	Node_Group_BEBE_El Node_Group_BE__BE_FRR Node_Group_FRFR_El Node_Group_FR__FR_FRR
	connection_group__connection		Connection_groupConn_FRR Connection_groupConn_El

While original suffers from huge amount of connection__to/from, we can use the same connection for multiple flows - I guess the question is where we want to aggragate the flows. Personally I prefer having one connection being connected to multiple nodes.

From the above options, my preference goes for connection__node__node as it is more readable for the user than the current implementation and at the same time only have one loss parameter / fix_ratio (not left_node_loss & right_node_loss). This also allows us to have one connection connecting multiple nodes easily.

In GitLab by @jkiviluo on Apr 23, 2020, 10:05

I strongly object to a world where entities are discriminated on basis of color or on basis of dimensionality. It should be about function - not if something has one dimension or two dimensions. Naturally, there needs to be a way to see what one needs to see at a given time. For this, we could have a tree where one selects to display 'physical entities' or to display 'model structure' - or to display 'scenario structure' or whatever users and developers will find useful in the future.

However, I did some experimenting with a data store and while a physical node__node connection could work fine, it would be more difficult with units, because the dimensionality can change from unit to unit (multiple inputs/outputs). And it would not be consistent to treat units differently than connections. Hence, it might be best to have a user-facing one-dimensional object whenever the physical entity has data spread across multiple relationships.

The GUI will need improvements though, but the good thing is that they don't need to be there right away. When I select couple of multi-dimensional entities, I'd like to see parameters in good order within one view. This has been discussed before in relation to tabular view (capability to show data in the same table even when some data is for e.g. unit__output_node and some data for input_node__unit, but it should also work in the list view.

In GitLab by @DillonJ on Apr 23, 2020, 10:26

This also allows us to have one connection connecting multiple nodes easily

A question here is if we want to allow this... if you have more than two nodes related to a connection, what does it mean? Now you need some sort of balance constraint as the fix ratio constraints only relate two nodes at a time. If you need balance, then it is more like a node and node should be used instead in this case.

I personally think we need node__node to allow what you are effectively trying to do with a connection connected to more than two nodes.

If we had node__node (and associated flow variable), we can make logical connections between nodes where it is an unconstrained logical connection. This gives possibilities to make your system simpler. We already have node__node to handle diffusion but there is no explicit variable... so that is all that is really needed, and it would be nice to see the flow between nodes explicitly, even in the case of diffusion.

In summary:

node__node = unconstrained logical connection between arbitrary numbers of nodes
connection = physically constrained flow between two nodes with losses

You can use the combination of these two things to handle complex situations where you effectively want a physically constrained connection connecting more than two nodes.

In GitLab by @DillonJ on Apr 23, 2020, 10:38

Also... @mihlema

the question is where we want to aggragate the flows

The philosophy behind the "model adjustments" was to rationalise the structure to be a little more coherent based on the idea that the node is the universal aggregator. I think in terms of a commodity networks this makes a lot of sense and we should stick with this idea - that balance always happens in nodes. We could think of our system like this

balance happens in nodes
constrained flows between nodes happen in connections
aggregations happen in node__node connections
conversions happen in units

And this, for me, is why it is nice to have these main system elements appear at the object class level

Closing due to stale status.

Closing because stale.

spine-tools / SpineOpt.jl

Data structure for connections is inefficient for simple cases #161