iiasa / ixmp4

A data warehouse for high-powered scenario analysis in the domain of integrated assessment of climate change and energy systems modeling
https://docs.ece.iiasa.ac.at/ixmp4
MIT License
10 stars 6 forks source link

Add filtering by model and scenario name in MetaRepository #66

Closed danielhuppmann closed 6 months ago

danielhuppmann commented 6 months ago

For the integration with pyam, I want a feature to filter the MetaRepository by model and scenario name, like

platform.meta.tabulate(run={'default_only': True}, scenario={name="scen_a"})

The PR below implements these filters and they work as expected, but I get the following warning

/Users/dh/.venv/iamc/lib/python3.12/site-packages/pandas/io/sql.py:1600: SAWarning:
SELECT statement has a cartesian product between FROM element(s) "runmetaentry"
and FROM element "scenario".  Apply join condition(s) between each element to resolve.

Any pointers to how/where to resolve that, @meksor @pmussak?

meksor commented 6 months ago

Wait a minute, shouldn't filtering via model/scenario already be possible with the run filter? Like: run={"model": {"name":...}}

meksor commented 6 months ago

Also both your Model and Run filters inherit from "RunFilter" which is probably the reason for the warning. If the join paths are incorrect the database will sometimes try to calculate the cartesian product of two tables. I don't know why it does this, I haven't ever encountered a use case for this but maybe it's a historical artifact of the computer science background of databases. (Outer joins are expressed as the cross product of two tables in some databases too)

danielhuppmann commented 6 months ago

Thanks @meksor for the pointer.

I believe that

platform.meta.tabulate(run={model="Model 1"})

is deactivated by https://github.com/iiasa/ixmp4/blob/3f45bba201de4a448f5e7be930b9e6593fb59d49/ixmp4/data/db/meta/filter.py#L16

And after some back and forth, I think that the API should indeed be

platform.meta.tabulate(run={default_only=True}, model="Model 1"})

because the model-name is an attribute of the Model table, not the Run table.

This is similar to the (already-working) filter-signature of the iamc datapoints:

platform.iamc.tabulate(run={'default_only': True}, model="Model 1")
danielhuppmann commented 6 months ago

In the commit above, I switched to import

from ixmp4.data.db.model.filter import ModelFilter

which yields the same warning (and also works as expected).

I also tried

from ixmp4.data.db.filters.model import ModelFilter

which raised an error.

I have to admit that the difference between model.filter and filters.model is still not clear to me...

I guess I would need to do something like

WHERE run.d in (SELECT run_id from SCENARIO WHERE name == ...)

instead of the multi-join...?

meksor commented 6 months ago

Thanks @meksor for the pointer.

I believe that

platform.meta.tabulate(run={model="Model 1"})

is deactivated by https://github.com/iiasa/ixmp4/blob/3f45bba201de4a448f5e7be930b9e6593fb59d49/ixmp4/data/db/meta/filter.py#L16

? I don't think that's right

meksor commented 6 months ago

The run filter was one of the first filters in ixmp, you can always filter by a model and its name through the runs filter except on the /iamc/datapoints filter. Because its supposed to act like that big table i guess... I would think about moving this stuff around now, seems like we are introducing even more custom special-case inconsistencies...

meksor commented 6 months ago

It bad enough IMO that the defaults for the filter are manually set every time it is used, that seems horrible to me, but whatever ^^

meksor commented 6 months ago

In the commit above, I switched to import

from ixmp4.data.db.model.filter import ModelFilter

which yields the same warning (and also works as expected).

I also tried

from ixmp4.data.db.filters.model import ModelFilter

which raised an error.

Which error?

I have to admit that the difference between model.filter and filters.model is still not clear to me...

One is the base, to be re-used the other is the one used by REST etc.

I guess I would need to do something like

WHERE run.d in (SELECT run_id from SCENARIO WHERE name == ...)

instead of the multi-join...?

No...

danielhuppmann commented 6 months ago

Not quite sure what you mean by "not right", but from my perspective

test.mp.runs.tabulate()

should accept filters default_only, run_id, version, model (name and id) and scenario (name and id),

but a RunFilter used as filter from iamc-datapoints or meta should only accept default_only, run_id and version because these are the attributes in the Run table.

So I guess we have to split the RunFilter used in Runs from the other uses? That would also allow not having to set the defaults every time...

meksor commented 6 months ago

Ok well, if thats really desirable then i suggest making a RunFilter and a NestedRunFilter or something and use them accordingly.

meksor commented 6 months ago

So i just double checked, and indeed, we can already filter by models with, for example

{
  "run": {
    "model": {"name__like": "Model *"},
    "default_only": true,
    "is_default": true
  }
}

And presumably that enables: tabulate(run={"model": {"name": "..."}})

danielhuppmann commented 6 months ago

Your presumption is correct for platform.runs.tabulate() but not for platforms.meta.tabulate()...

meksor commented 6 months ago

Ah, but i think that is because of this issue: https://github.com/iiasa/ixmp4/issues/55 Not because it is not implemented like that.

danielhuppmann commented 6 months ago

Ah, but i think that is because of this issue: #55 Not because it is not implemented like that.

I don't think so because when I remove that Field, I can filter by model and scenario... https://github.com/iiasa/ixmp4/blob/3f45bba201de4a448f5e7be930b9e6593fb59d49/ixmp4/data/db/meta/filter.py#L16

meksor commented 6 months ago

Well then it looks like you found a workaround for the issue... I still think it would work if that issue is fixed, im just getting a 500 when i try on the current ixmp instances.

danielhuppmann commented 6 months ago

Closing in favor of #69