Closed glatterf42 closed 4 months ago
I can't quite get the connection between Parameter
and Unit
to work. In theory, we want every Parameter
to be able to have multiple Unit
s and every Unit
(defined for the ~Run
~Platform
) should be usable by multiple Parameter
s. The relationship
pattern that most closely resembles this is a many to many relationship.
These typically work via an association table, i.e. a table in-between Parameter
and Unit
that simply holds Parameter.id
and Unit.id
pairs, so that each Parameter
and Unit
can refer to all pairs that include their own id
. However, this association table wants to be as efficient as possible, so it half expects and is half recommended to only contain every pair of (Parameter.id, Unit.id)
just once, and all examples I can find online only show how to set the relationship up that way.
Unfortunately, our needs are more complicated since we want Parameter.units
to be a list of Unit
s, which are not necessarily distinct (if I understand correctly, please correct me if I'm wrong). So a Parameter
may contain several rows of data
with the same Unit
, which may even be ordered arbitrarily. Thus, our storage of this Unit
list needs to preserve order and be able to store the same Unit
more than once.
With the current many-to-many setup, I have not been able to make this work. I've also read up on association objects and association proxy in general and I'm not sure we can make it work. We would need to create a purposefully-inefficient table between Parameter
and Unit
or figure out a way to do something like this:
Unit
s (via unit.name
, so we need to get()
the Unit
s first)Parameter.id
and Unit.id
sParameter.units
to the Unit
objects associated with Parameter
, keeping the order of the input list and potentially get()
ting the Unit
s firstThe advantages would be that other layers (such as the facade layer) would be able to access Parameter.units
directly without having to get()
Unit
s first; that we could keep as many always-present columns as possible outside of the JSON field data
(possibly increasing performance); that the relation between Parameter
and Unit
would be baked into the DB layer and apparent from there.
The alternative would be to simply keep units
(and values
, while we're at it) parts of data
. This would eliminate both the need to care about order as this is taken care of by JSON and the need to invest more time now to come up with a solution that seems to be pretty non-standard, so would likely be fragile. We could also somewhat bake the relationship into the DB layer through the validator function for data
. In there, we could try
to get()
all distinct Unit
names once and raise an error if a Unit
doesn't exist. This way, we would still ensure that all units
(which we would store as their unit.name
s) exist in the Unit
list of the ~Run
~Platform
, as I would understand the current ixmp docs. The only real downside would be that we would return a Parameter
which doesn't come with the Unit
objects pre-loaded, so users would have to manually get()
them if they need them. That sounds acceptable to me.
@danielhuppmann, please let me know what you think.
@glatterf42, I guess that this PR should now be directed to the main
branch. The files-changed section show quite a number of changes that are already merged.
A comment on the question above by @glatterf42:
every
Unit
(defined for theRun
)
The units are defined at the Platform
level (and that should not be changed), I assume that is a typo?
@glatterf42, I guess that this PR should now be directed to the
main
branch. The files-changed section show quite a number of changes that are already merged.
The idea was to merge this PR to include/optimization-parameter, which could collect changes to this part of the DB model before it's merged to main. That collection branch should also recognize which commits are already part of main, but maybe this one doesn't since it doesn't know where the collection branch is aiming.
The reason I didn't merge it yet was because I wanted to address your question why Parameter
doesn't inherit from Table
, which I did locally already based on #82, but didn't push here yet because I wanted to still expand the tests a little, too -- which was when I encountered the current issue.
And yes, that is technically not in line with trunk-based development and I should have just focused on the mixins here.
I guess that you have to rebase include/optimization-parameter
onto main
(now and every time other changes are merged there).
I guess that you have to rebase
optimization\parameter
ontomain
(now and every time other changes are merged there).
True; at the moment, I'd have to rebase the mixin branch on top of main, the collection branch on top of that, and this branch on top of the new collection branch.
Re the actual question, without looking at the code yet (because of the rebase-branch-mixin-mixup):
I think that also our implementation of the "foreign-keys" using index-sets and columns can only be validated at the code level (not the database). Therefore, I think it's ok that we also do that for units, hence storing the columns plus units plus values in data
.
Great, thanks :)
With this clarification, all uncommented tests should now work (and that's even more than before), so I'd be in favor of merging this PR and opening another one for the remaining tests and code layers :)
As promised in a recent team meeting, the next steps for the message_ix data model work will use a more trunk-based development approach. This is where I got to today with introducing
optimization.Parameter
s. The tests for creating and getting them pass for the DB layer, others are still disabled. Ruff might complain that some temporarily-commented lines in the test file are now too long. If the tests run at all, maybe they only trigger on PRs to main. I could enable them for PRs to include/optimization-parameter as well or we merge this one and then set up a soon-to-be-final PR to main to collect further changes.I'm not yet sure this is the best way to handle the
values
andunits
: as separate lists. I will test if this behaves as desired/expected and will rework if it doesn't.