iiasa / ixmp4

A data warehouse for high-powered scenario analysis in the domain of integrated assessment of climate change and energy systems modeling
https://docs.ece.iiasa.ac.at/ixmp4
MIT License
11 stars 6 forks source link

Include optimization parameter basis #79

Closed glatterf42 closed 4 months ago

glatterf42 commented 6 months ago

As promised in a recent team meeting, the next steps for the message_ix data model work will use a more trunk-based development approach. This is where I got to today with introducing optimization.Parameters. The tests for creating and getting them pass for the DB layer, others are still disabled. Ruff might complain that some temporarily-commented lines in the test file are now too long. If the tests run at all, maybe they only trigger on PRs to main. I could enable them for PRs to include/optimization-parameter as well or we merge this one and then set up a soon-to-be-final PR to main to collect further changes.

I'm not yet sure this is the best way to handle the values and units: as separate lists. I will test if this behaves as desired/expected and will rework if it doesn't.

glatterf42 commented 6 months ago

I can't quite get the connection between Parameter and Unit to work. In theory, we want every Parameter to be able to have multiple Units and every Unit (defined for the ~Run~Platform) should be usable by multiple Parameters. The relationship pattern that most closely resembles this is a many to many relationship. These typically work via an association table, i.e. a table in-between Parameter and Unit that simply holds Parameter.id and Unit.id pairs, so that each Parameter and Unit can refer to all pairs that include their own id. However, this association table wants to be as efficient as possible, so it half expects and is half recommended to only contain every pair of (Parameter.id, Unit.id) just once, and all examples I can find online only show how to set the relationship up that way. Unfortunately, our needs are more complicated since we want Parameter.units to be a list of Units, which are not necessarily distinct (if I understand correctly, please correct me if I'm wrong). So a Parameter may contain several rows of data with the same Unit, which may even be ordered arbitrarily. Thus, our storage of this Unit list needs to preserve order and be able to store the same Unit more than once. With the current many-to-many setup, I have not been able to make this work. I've also read up on association objects and association proxy in general and I'm not sure we can make it work. We would need to create a purposefully-inefficient table between Parameter and Unit or figure out a way to do something like this:

  1. Receive a list of Units (via unit.name, so we need to get() the Units first)
  2. Create entries in the association table with Parameter.id and Unit.ids
  3. Set Parameter.units to the Unit objects associated with Parameter, keeping the order of the input list and potentially get()ting the Units first

The advantages would be that other layers (such as the facade layer) would be able to access Parameter.units directly without having to get() Units first; that we could keep as many always-present columns as possible outside of the JSON field data (possibly increasing performance); that the relation between Parameter and Unit would be baked into the DB layer and apparent from there.

The alternative would be to simply keep units (and values, while we're at it) parts of data. This would eliminate both the need to care about order as this is taken care of by JSON and the need to invest more time now to come up with a solution that seems to be pretty non-standard, so would likely be fragile. We could also somewhat bake the relationship into the DB layer through the validator function for data. In there, we could try to get() all distinct Unit names once and raise an error if a Unit doesn't exist. This way, we would still ensure that all units (which we would store as their unit.names) exist in the Unit list of the ~Run~Platform, as I would understand the current ixmp docs. The only real downside would be that we would return a Parameter which doesn't come with the Unit objects pre-loaded, so users would have to manually get() them if they need them. That sounds acceptable to me.

@danielhuppmann, please let me know what you think.

danielhuppmann commented 6 months ago

@glatterf42, I guess that this PR should now be directed to the main branch. The files-changed section show quite a number of changes that are already merged.

danielhuppmann commented 6 months ago

A comment on the question above by @glatterf42:

every Unit (defined for the Run)

The units are defined at the Platform level (and that should not be changed), I assume that is a typo?

glatterf42 commented 6 months ago

@glatterf42, I guess that this PR should now be directed to the main branch. The files-changed section show quite a number of changes that are already merged.

The idea was to merge this PR to include/optimization-parameter, which could collect changes to this part of the DB model before it's merged to main. That collection branch should also recognize which commits are already part of main, but maybe this one doesn't since it doesn't know where the collection branch is aiming. The reason I didn't merge it yet was because I wanted to address your question why Parameter doesn't inherit from Table, which I did locally already based on #82, but didn't push here yet because I wanted to still expand the tests a little, too -- which was when I encountered the current issue. And yes, that is technically not in line with trunk-based development and I should have just focused on the mixins here.

danielhuppmann commented 6 months ago

I guess that you have to rebase include/optimization-parameter onto main (now and every time other changes are merged there).

glatterf42 commented 6 months ago

I guess that you have to rebase optimization\parameter onto main (now and every time other changes are merged there).

True; at the moment, I'd have to rebase the mixin branch on top of main, the collection branch on top of that, and this branch on top of the new collection branch.

danielhuppmann commented 6 months ago

Re the actual question, without looking at the code yet (because of the rebase-branch-mixin-mixup):

I think that also our implementation of the "foreign-keys" using index-sets and columns can only be validated at the code level (not the database). Therefore, I think it's ok that we also do that for units, hence storing the columns plus units plus values in data.

glatterf42 commented 6 months ago

Great, thanks :)

glatterf42 commented 6 months ago

With this clarification, all uncommented tests should now work (and that's even more than before), so I'd be in favor of merging this PR and opening another one for the remaining tests and code layers :)