Closed spine-o-bot closed 3 years ago
In GitLab by @DillonJ on Aug 14, 2018, 10:00
changed the description
In GitLab by @DillonJ on Aug 14, 2018, 10:01
changed the description
In GitLab by @manuelma on Aug 14, 2018, 10:04
If I understand correctly this would mean that no relationship is involved in another relationship, ie, relationships are only between objects?
EDIT: same for parameters: all parameters are defined against objects, never against relationships?
In GitLab by @DillonJ on Aug 14, 2018, 11:51
So yes, no relationships between relationships but you can create parameters against relationships. This is the idea.
In GitLab by @Poncelet on Aug 14, 2018, 13:11
Also a clarification question for me: let's say we have a parameter with three indices (as in your example), would that parameter needs to be specified on the upi object?
If so, I think there is a significant risk that data will be scattered accoss a lot of these objects from these new object classes?
In GitLab by @DillonJ on Aug 14, 2018, 14:39
If the data parameter relates to the three objects, then yes, otherwise it would be associated with the objects themselves. So actually, there is only one logical place for the data parameters in this example, so I don't think scattering would be an issue.
In GitLab by @manuelma on Aug 14, 2018, 15:15
Some thoughts:
I believe the issue is more related to how the user enters, visualizes, and accesses the data, rather than how we developers keep it in the database.
The non-nested approach helps with the issue of visualizing the data, but doesn't solve the issue of entering it: as with the nested approach, the user still needs to create a lot of relationship classes, parameters and so on.
Maybe we just need to focus on developing a simple interface for entering the data that makes things as easy as possible for the user?
In GitLab by @DillonJ on Aug 14, 2018, 16:50
I get what you are saying, but let's not forget that we are building a tool where easy development is a key requirement. The vision is that we assume the problem will change, so we make it easy to perform the development activities that are required to modify and develop new optimization models because we assume we don't know the problem that future users will want to solve.
With this in mind, I think it's very important that we have a simple, easy to understand and easy to follow/diagnose/debug data structure where we don't have too many levels of nesting.
Edit: in the end of the day, the data has to be entered somewhere and so far (correct me if I'm wrong) the only area where there is a need for these three way relationships is in the implementation of archetypes, which would be a once off development type activity
In GitLab by @DillonJ on Aug 14, 2018, 16:55
Also - do we have a real example from the 2nd test system - do these 3-way relationships arise there? Perhaps we can use it to evaluate the different approaches. It would be great to get a julia code snippet that illustrate the issue.
In GitLab by @pvper on Aug 15, 2018, 05:55
As I said in #25 this might be hard to use in the current data structure
We are going to need a lot of relationship classes for the upi class (ex: unit_upi, commodity_upi) and then for all the input classes that would need a relationship class. We would definitely have a problem showing this class in the current tree.
How do we keep track of the input order of the indices? Is that hard coded into julia for each parameter?
Lets say that we have the case with commodity input output ratio. We have two units a gas->electricity unit and a electricity->gas (hydrogen production ex.) they both want the same parameter RatioOutputInput. In this case we would need two upi objects for the same parameter since one would have the relationship input commodity as gas and the other as electricity. How would we keep track of which upi objects that corresponds to the same parameter?
let's say that we want an parameter with the three indices in your example. How would multiple units have the same parameter? multiple relationships in the unit_upi? and if they want different values for the same parameter? a relationship parameter?
How would we query for parameters? Since we now might have multiple objects pointing to the same parameter, we would need to store that somewhere
In GitLab by @jkiviluo on Aug 15, 2018, 06:49
We will need to establish the order somehow and it will require an additional table or possibly some value column that would set the order. I guess each relationship to upi could also contain information about the order. But all these feel cumbersome.
Another way to do multiple dimensions is to add object_id columns in the relationship table. Drawback is that it will be limited to n columns and those columns take space (hopefully not much when they are empty). Per has another alternative in mind as well that we can consider.
In GitLab by @manuelma on Aug 15, 2018, 07:28
About the input order of the indices, I think the multidimensional parameter will only be indexed by upi
objects, so this should work in Julia:
for x in upi(): # upi() returns the set of 'upi' objects
y = ratio_input_output_flow(x)
# use y
end
In GitLab by @manuelma on Aug 15, 2018, 07:50
Lets say that we have the case with commodity input output ratio. We have two units a gas->electricity unit and a electricity->gas (hydrogen production ex.) they both want the same parameter RatioOutputInput. In this case we would need two upi objects for the same parameter since one would have the relationship input commodity as gas and the other as electricity. How would we keep track of which upi objects that corresponds to the same parameter?
If I get it right, in this example ratio_output_input
doesn't depend on which commodity is the input and which is the output?
Well in that case we could have the ratio_output_input
parameter defined against a relationship called electricity_gas
(of a class called eg inputcommodity_outputcommodity
). And then if say unitA
wants to access that parameter we can define another relationship called unitA_electricty_gas
(of a class called, say, unit_inputcommodity_outputcommodity
). Then in Julia:
u = "unitA"
ioc = input_commodity_outputcommodity(u) # should return "electricity_gas"
y = ratio_output_input(ioc) # should return the value stored in the parameter
I don't think this is a use case for 'upi'-type objects since the parameter only has two dimensions, so it can be defined against a relationship.
In GitLab by @pvper on Aug 15, 2018, 07:54
Not if the unit has multiple input or output commodities, ex: the gas -> electricity can also output heat, but we want the gas -> electricity to be the limiting relationship
In GitLab by @manuelma on Aug 15, 2018, 08:04
Ah ok then I got it wrong. But is it a use case for 'upi'-like objects? In my view 'upi' will be used only when a three-way relationship is needed, and in that case the relationship is 'melted' into an object. Viewed this way, I don't think there's any inherent difference between the upi
object class and the unit_process_commodity
relationship class, except the way it's represented in the database (the upi
is higher in the hierarchy and so it's accessed more easily).
In GitLab by @pvper on Aug 15, 2018, 10:27
Yes I don't think that the julia implementation depends on the database structure, we can always convert the data to the way we want it
And the nested representation can go into the upi object representation but the hierarchical dimension is lost, i.e the order of the relationship. So if the order is important we are loosing that in this formulation using the current data-structure.
Also I'm not sure if it is accessed more easily
We would have to create a new upi object for each combination of inputs, a bunch of relationship classes, the upi class would have a bunch of parameters but we don't know to which upi object they are needed/allowed so it would be harder to make a gui where the user only sees the needed parameters (probably minor issue).
There are also some integrity issues, these could probably be fixed with some additional constraints/triggers on the database.
What happens you delete a unit, the upi object still be there with a parameter but no unit pointing to it, same happens if you would delete a process or commodity in the example, you would be left with a incorrect upi object.
I guess the two main concerns for me are the loss of order and that you need to create some many objects, relationship classes to be able to insert a parameter.
In GitLab by @manuelma on Aug 15, 2018, 10:36
What happens you delete a unit, the upi object still be there with a parameter but no unit pointing to it, same happens if you would delete a process or commodity in the example, you would be left with a incorrect upi object.
That's an interesting argument in favor of the nested relationship approach. What do you think @DillonJ ?
Now, one thing that bothers me in the nested-relationship approach is that for relating three classes together, you need to relate two of them first. But maybe you don't want to relate any two of them, so you end up with a relationship with no actual meaning.
In GitLab by @DillonJ on Aug 15, 2018, 11:41
I think the long and short of it is that neither approach is ideal.
Here's another out of the box suggestion where we more radically change how relationships are defined...
We currently have a horizontal structure for defining relationship classes, i.e.
id, parent_class, child_class
We could do it differently - we could either expand the definition and support up to, say 3 levels (or more if we want)... e.g.
id, class1, class2, class3
Or we could even have a vertical structure that could support arbitrary relationships like:
id, class_id
and for a 3-way relationship you would create three rows of data for the same relationship id.
Both of these approaches eliminate the need for nesting and can support more relationship levels more easily I think.
Thoughts?
In GitLab by @pvper on Aug 15, 2018, 11:57
Now, one thing that bothers me in the nested-relationship approach is that for relating three classes together, you need to relate two of them first. But maybe you don't want to relate any two of them, so you end up with a relationship with no actual meaning.
@manuelma Yes I don't like that you will get unused relationships, that will for sure be a source of confusion.
@DillonJ I was thinking of the same way to have a vertical structure for parameter values, I will post a summary of that. I was only thinking of parameters but that could also work for relationships.
In GitLab by @pvper on Aug 15, 2018, 12:27
Create new tables for multidimensional parameters or modify current parameter tables
Here the idea is to add two tables parameter_input_group and parameter_input_value. The parameter_input_group table would hold all allowed inputs for a certain parameter, so for example if we want parameter(unit,commodity,commodity) then the parameter_input_group would hold an reference to the parameter and two rows for the allowed classes and their order.
The parameter_input_value would hold reference to the actual value and which objects that are used for the input.
Something like this:
This way we could have as many dimensions we want on a parameter and we would not fill the treeview with relationships. Also all parameters for a object will be in direct reference to that object.
It will also be easy to get parameter values from the database since no nesting.
You could add multiple parameters to the same input group so that it's easy to create multidimensional "sets" with many parameters.
since we are storing allowed classes and order it would be easy to make this type of interface:
Cons would be that we are making it more complex.
I guess it's going to be a bit harder to see which objects are related to each other.
There would be some integrity issues like what happens if you delete one of the inputs of the input group, then you would have to delete either the whole group or update the index for the remaining inputs.
When deleting a value or parameter, foreign keys should be able to cascade delete.
In GitLab by @DillonJ on Aug 15, 2018, 12:33
What problem is this trying to solve? The current parameter_value table can handle multidimensional parameters no differently to object parameters by using the relationship ID which gives us a handle on all the related entities from the relationship table
edit: that is if we were to use one of my proposed alternative approaches for defining multi-dimensional relationships
In GitLab by @pvper on Aug 15, 2018, 12:51
Yeah I agree that this could be applied if we where to use a the same vertical relationships, I saw your idea for changing the relationships after I made all pictures.
Anyway I guess one thing this would solve is that we wouldn't have any relationships without meaning:
In the example we have a unit with relationships to input and output commodities. If we want to add the parameter p(unit,commodity,commodity) we would have to create a another relationship between commodity and commodity that is not really there and would fill up the treeview. (This is might not be a bad thing)
Also this would make it so that you will have a direct relationship between a object and its multidimensional parameters without having to search the nested tree or right row in the vertical rel layout.
I realize that I've been thinking more on multidimensional parameters than relationships, and I agree that this change is unnecessary if we keep the current nested relationship model or add the vertical or more column one.
In GitLab by @manuelma on Aug 15, 2018, 14:17
display in the tree view - it will require many clicks to view the underlying data
obfuscation - they bury data layers deep
Julia code will be complex
These are the original problems we're trying to solve, but looking at the suggested alternatives, I feel they rather focus on how we store the data in the database.
For example if we took the horizontal approach suggested by @DillonJ, would that automatically have an impact on the treeview? Or improve the julia code? How? Maybe it's not the nested-relationships per-se what hides the data layers deep, but the treeview design --which we can always rework and improve quite independently of the database design.
Also note that the vertical approach is very similar to the nested relationships, in that we need one extra row for each additional dimension we want to have. But with nested relationships we have more integrity, for example if one of the object classes is deleted, it automatically deletes all the relationships involving it.
I guess my personal favorite is still the nested relationships. We don't put a limit on the number of dimensions, and we ensure database integrity just with foreign keys.
In GitLab by @manuelma on Aug 15, 2018, 14:39
What about this: in the current treeview when we click on an object, say gas
of the class commodity
, the relationship parameter value table shows all the parameters where gas
is either the parent or the child. But we could make it so it actually lists all the parameters where gas
is involved through any number of relationships. In this way the user would see the data of a multidimensional parameter with just two clicks.
In GitLab by @DillonJ on Aug 15, 2018, 15:15
@manuelma The problems listed above follow from the underlying data structure. My horizontal solution above solves the first two and maybe also the third, because you would click on an object and find all the defined relationships - one of them being a 3-way relationship - once this single 3-way relationship is clicked, we see data associated with it...
In other words, for the 3-way relationship we have a single relationship entity that we can work with that removes the need for nesting, additional clicks and unburies the associated parameters
In GitLab by @manuelma on Aug 15, 2018, 15:28
I don't see it @DillonJ, for instance keeping the nested relationships as they are now, we can also make the treeview show all the relationships involving an object at the same level, it would just be a different query. Nesting in the database doesn't mean we necessarily need to nest in the treeview, does it? But nesting in the database has advantages from the point of view of data integrity.
In GitLab by @DillonJ on Aug 15, 2018, 15:30
Don't forget that the treeview will not be the primary method for putting models together from scratch - we do have to consider the complexity of the underlying data as this will impact many things, not just the treeview.
Edit: The view better correlates with the underlying data which in my opinion is very advantageous
In GitLab by @manuelma on Aug 15, 2018, 16:02
Ok I'm onboard.
So I guess the vertical approach you suggest @DillonJ is strictly better than the nested-relationships:
Now I'm not sure about the horizontal approach, is it really viable? So anybody who comes up with a parameter with more than n
dimensions just can't use Spine?
I believe vertical is more in line with EAV. Also correct me if I'm wrong, but @pvper's suggestion is essentially the same as the vertical approach?
In GitLab by @pvper on Aug 16, 2018, 07:17
Yes my suggestion is the vertical approach, but applied to parameters instead of relationships
I guess the reason why I thought about parameters is that I was trying to insert another model we have at VTT into the spine format, the model has a lot of multidimensional parameters and i ended up with something like this:
If going with a flat vertical hierarchy I'm in favor doing it on the relationships as @DillonJ suggested.
But at the same time I don't think it's a simpler design, we have to do more work than with the nested model in terms of keeping the integrity.
In any case all these representations are essentially the same data but stored differently, so all of them can be displayed the same way. If we want a flat treeview we could display it something like this:
In GitLab by @manuelma on Aug 16, 2018, 07:25
I partly agree with you @pvper in that nested relationships are great for keeping the database integrity. Intermediate relationships with no meaning could have the 'hidden' attribute set so they are not displayed.
But I also agree with @DillonJ in that the database itself would be simpler to 'read' by a human if we took the vertical approach he suggests. It requires more work designing triggers that would keep the integrity though. Anyways I'm giving it a try right now to try and see the implications for the implementation.
In the end, I think we all agree in that the treeview has to look exactly as you @pvper just outlined, ie:
CCGT
In GitLab by @jkiviluo on Aug 16, 2018, 09:47
I suppose the vertical approach would require a column that states the order number for the dimension in the relationship_class. This is not needed in the horizontal approach, but I too feel the unease at making a strict limit on the maximum number of dimensions - although this could be a database specific property (Toolbox could read databases with different n object columns).
In GitLab by @jkiviluo on Aug 16, 2018, 09:51
In the end, what matters is
In GitLab by @manuelma on Aug 16, 2018, 09:51
Below is my working version of the relationship_class
table. Field dimension
holds the order and together with the id
forms the primary key. I'm making progress, it's not so bad.
CREATE TABLE IF NOT EXISTS relationship_class (
id INTEGER NOT NULL,
dimension INTEGER NOT NULL,
object_class_id INTEGER NOT NULL,
name VARCHAR(155) NOT NULL,
hidden INTEGER DEFAULT '0',
commit_id INTEGER,
PRIMARY KEY (id, dimension),
FOREIGN KEY(commit_id) REFERENCES "commit" (id),
FOREIGN KEY(object_class_id) REFERENCES object_class (id) ON DELETE CASCADE ON UPDATE CASCADE
);
The other idea is also interesting, allowing databases with variable number of fields... I wonder how the queries would look like...
In GitLab by @DillonJ on Aug 16, 2018, 12:22
I like that the vertical version in that it is flexible, but the horizontal version is neat with the full relationship in definition in a single row and also relationship instances would also take a single row and data integrity would be ensured.
If we decided to support, say, 5 dimensions, would that be a real limitation in practice?
In GitLab by @pvper on Aug 16, 2018, 12:36
One problem of with vertical and horizontal approach is that if you want different parameters on multiple levels?
Ex: if you have a relationship class with 3 levels unit,commodity,commodity (u_c_c) and you want parameter1 on the level u_c and parameter2 on the level u_c_c, there is no way of telling which level the parameter belongs to. You could of course add a new column to the parameter table referencing the order/level/dimension but then you have another column to keep track of since the order wont be unique so I don't think you can easily use a foreign key there, not sure though might be wrong.
In GitLab by @manuelma on Aug 16, 2018, 12:36
I wonder how many of those columns will be empty for most reasonable models. The thing is that if for some reason we require a single parameter to have more than 5 dimensions, the design is ruined.
With the vertical approach we can create triggers such as the one below to ensure, in this case, that all rows from a relationships are gone after deleting any of their classes.
CREATE TRIGGER after_object_class_delete
AFTER DELETE ON object_class
FOR EACH ROW
BEGIN
DELETE FROM relationship_class
WHERE id IN (
SELECT id FROM relationship_class
WHERE object_class_id = OLD.id
);
END
In GitLab by @manuelma on Aug 16, 2018, 12:38
@pvper what about defining two relationships, one with two levels and the other with three levels? Would that work?
In GitLab by @pvper on Aug 16, 2018, 12:45
what about defining two relationships, one with two levels and the other with three levels? Would that work?
yes but if those relationships are connected where the u_c_c doesn't make sense without u_c then you will be left with a incorrect relationship path and parameter if you remove u_c.
In GitLab by @DillonJ on Aug 16, 2018, 12:46
@manuelma I suppose the vertical implementation is the cleanest
In GitLab by @DillonJ on Aug 16, 2018, 12:55
but you get u_c from each u_c_c if you use the Julia iterable as suggested by @manuelma
Edit: actually, I see what you are saying - you might have some parameters associated with u_c and then some with u_c_c and you would need to be able to relate those two parameters to each other somehow
In GitLab by @manuelma on Aug 16, 2018, 13:00
So you actually need a relationship_class between a relationship_class and an object_class ?
In GitLab by @pvper on Aug 17, 2018, 05:25
No not really what I meant.
Lets say we want the parameter RatioOutputInput for the relationship CCGT_Gas_Electricity. But we also want the parameter Capacity on the relationship CCGT_Gas. Now the relationship CCGT_Gas_Electricity and it's parameters only makes sense if we have the relationship CCGT_Gas, so we put them in the same relationship class. But now we have two parameters one that requires the input (unit,commodity) and (unit,commodity,commodity) on the same relationship_class but we have no way of knowing which requires the whole path or just the sub-path.
We could create two unrelated relationship classes but if we want to change the relationship to CCGT_BioGas we now have to change both, here it would be easy to forget changing one and we have an invalid model.
One way would be to introduce the dimension_id into the parameter_value table, but as I said one more id to keep in check.
And I don't think creating a relationship between relationship classes is the answer, that would just add more complexity.
Let me know if I'm making any sense!
In GitLab by @manuelma on Aug 17, 2018, 06:47
thanks @pvper for the clarification. So please let me see if I understand. You want to define capacity
against the (3-D relationship) CCGT_gas_electricity
, even though it only applies to two of the dimensions?
I guess we could allow relationships to have one of the dimensions (specified in the class) empty. In that case you'd be able to define a relationship called, say, CCGT_gas_nothing
for the relationship class unit_inputcommodity_outputcommodity
. Then you define capacity
against CCGT_gas_nothing
. Then in Julia, when you iterate over the relationship class unit_inputcommodity_outputcommodity
, one of the returned tuples will be ("CCGT", "gas", nothing)
, and you would know that "CCGT"
is the name of a unit and "gas"
is the name of a commodity.
Would that solve the problem?
In GitLab by @manuelma on Aug 17, 2018, 07:15
Please check new issue Spine-project/Spine-Toolbox#129
I just pushed to 'toolbox/dev' a new version that works with the vertical approach and a 'flattened' version of the treeview. Old version that works with the nested approach is still intact in 'toolbox/nested_relationships'.
Please note that your databases created with 'nested_relationships' won't work in the most recent 'dev'. But you can try and create a new one to play.
In GitLab by @DillonJ on Aug 17, 2018, 09:59
A note in relation to multi level parameters using the vertical approach
We have this issue described by @pvper as "Displays relationship parameters for all levels" meaning that if I want to create a parameter value against the level 2 relationship, then it would appear alongside the level 3 parameters (if we associate them all with the single level 3 relationship).
I don't think it would be a good idea to use the functionality in this way - I think as @manuelma says, you would need to be explicit about the relationships and create separate relationships for the level 2 and level 3 relationships and associate the relevant parameters to each. However, the vertical approach allows us to omit the level 2 relationship if we don't need it or if it has no meaning and has no parameters. So in this case, it's actually an advantage of the vertical implementation.
Edit, but here's a question... say I have a level two relationship: unit_process and I have parameters associated with it - in the julia code I am iterating through all unit_process combinations... however, for each unit_process relationship, I have a three level relationship unit_process_commodity and I want to now loop through the related commodities. Can I do this easily in Julia if the unit_process relationship and the unit_process_commodity relationship are separate to each other?
In GitLab by @manuelma on Aug 17, 2018, 10:09
So calling in Julia unit_process_commodity("unitA", "processB")
should return a list of all comodities related (as the third dimension) to "unitA" and "processB"...
I think it's doable (99% sure), we just need to keep track of the dimensions. One way would be to define our methods with keyword arguments, so you would need to call it like this unit_process_commodity(unit="unitA", process="processB")
Ideally you could also call unit_process_commodity(unit="unitA", commodity="gas")
and get the list of processes.
In GitLab by @pvper on Aug 17, 2018, 10:11
Yeah that's the problem, and say if you delete your unit_process relationship if the unit_process_commodity is not in the same class you will somehow have to delete that as well, and how would you keep track of that.
In GitLab by @manuelma on Aug 17, 2018, 10:15
@pvper this morning I pitched this idea of leaving one of the dimensions 'empty' when defining relationships: https://gitlab.vtt.fi/spine/model/issues/36#note_1424
What do you think? In that way you could have all the relationships in the same class even if some of them don't use all the dimensions.
In GitLab by @DillonJ on Aug 17, 2018, 10:24
@pvper but if the two relationships are independent, then you might want the three level relationship and not the two level relationship and you might want to delete one without the other, so there is no problem because they are independent.
But the question is, if I define unit_process and unit_process_commodity independently of each other and I have a handle on a unit_process relationship... would I easily be able to loop through all the related commodities from the three level relationship unit_process_commodity which in theory is an independent relationship but I am, in effect, trying to link them on the fly - if you know what I'm getting at.
In GitLab by @DillonJ on Aug 17, 2018, 10:32
@manuelma That sounds perfect what you are proposing regarding being able to leave one of the relationships blank. Does this work in the treeview and would we be able to easily define parameters against higher level implied relationships (CCGT_gas_nothing) within a three_level relationship class?
Allowing this would allow us to do both - if the two_level relationship is independent of the three level relationship, we define them separately and can delete either one or the other independently and there is no problem - but if they are related to each other then we use the omitted relationship approach.
What about the scenario where we create a two_level relationship and then at some later point we relaize we need to add a third level - can we do this without having to re-create the level-two stuff?
The original issue
could not be created. This is a dummy issue, replacing the original one. It contains everything but the original issue description. In case the gitlab repository is still existing, visit the following link to show the original issue:
TODO