Incorrect semantics in Platform, TimeSeries, Scenario

khaeru commented 5 years ago

Copied from #108.

Some of these would require backwards-incompatible renaming of methods (e.g. items 3, 6) and would need to occur in the next major version (1.0)
Some could be implemented right away (e.g. item 7; or items 4/5 with an alias to the old name, to be deprecated).

Methods in the wrong place

For a clean class hierarchy, since TimeSeries is the parent of Scenario, then TimeSeries methods and code should not depend on things implemented in Scenario.

~TimeSeries.checkout() (1) calls Scenario.has_solution() and (2) raises an exception with the text "This Scenario…"—but it is the parent class of Scenario.~ Done in #270.
~~TimeSeries.add_timeseries() docstring references "MESSAGE-Scheme scenarios".~~ Fixed with #120.
~~TimeSeries.timeseries() takes kwargs regions, units, years. AFAICT e.g. 'years' is not necessarily a set in an ixmp model; only in MESSAGEix.~~ TimeSeries data has a 'year' dimension by default; the existence (or not) of set(s) that correspond to years in a model/Scenario is distinct.

Pythonic semantics

Platform.scenarios_list() → list_scenarios(). Methods should be named "[verb] [noun]"; only attributes/properties should have "[noun]" names.
TimeSeries.add_timeseries() → add_data(). Repeating the class name is confusing; this makes the user imagine doing ts1, ts2 = TimeSeries(), TimeSeries(); ts1.add_timeseries(ts2).
Scenario.add_set() is a misnomer: this method "adds elements to an existing set"; it does not "add a new set". On the other hand, Scenario.init_set() actually "adds a new set" to the Scenario definition. I'd propose:
- Scenario.add_set() = add a new set.
- Scenario.add_set_elements() = add elements to an existing set.
~Scenario.item(), .element() → rename to ._item(), ._element(). The docstrings state these are "internal function"s; Python convention is to indicate this with leading _ on the name.~ Removed with the implementation of the Backend API.
Scenario.add_par() will operate fine if given only key, and not val. In this case, the values are actually contained in key → rename this argument to key_or_val.

Ease-of-use:

~Platform.__del__() should invoke close_db() when necessary.~ Done in #298.
Scenario.clone() should raise a warning if platform==self.platform; the user might think they are cloning elsewhere, but have made a coding error (see also #101, #109).
Scenario.solve(): the 'model' keyword is required, but it could be inferred.

danielhuppmann commented 5 years ago

Thanks @khaeru for your (as always) very keen observations! Agree with most of them, except for two.

TimeSeries.add_timeseries() → add_data(). Repeating the class name is confusing; this makes the user imagine doing ts1, ts2 = TimeSeries(), TimeSeries(); ts1.add_timeseries(ts2).

Note that there is a distinction between input data (sets, parameters), raw output (variables and equations) and timeseries in the IAMC format. All of those can be considered "data".

Scenario.clone() should raise a warning if platform==self.platform; the user might think they are cloning elsewhere, but have made a coding error (see also #101, #109).

Disagree - cloning within a platform is the most common use case.

khaeru commented 5 years ago

TimeSeries.add_timeseries() → add_data(). Repeating the class name is confusing; this makes the user imagine doing ts1, ts2 = TimeSeries(), TimeSeries(); ts1.add_timeseries(ts2).

Note that there is a distinction between input data (sets, parameters), raw output (variables and equations) and timeseries in the IAMC format. All of those can be considered "data".

Okay, I'm open to alternate ideas on names. To expand: the confusion arises because there are two things called by the same name: (1) "time series data" in the generic sense, i.e. array data in which one of the dimensions is a time dimension; and then (2) the ixmp.TimeSeries class/object, which is a collection of 0 or more of (1). The object (2) has methods to add/retrieve/remove data (1). But the names of those methods don't help the user understand that it is specifically (1) (rather than (2)) that is being manipulated. One could imagine a method that merges two ixmp.TimeSeries objects (2), by copying the data (1) of one object into the other, thereby "adding" it.

Scenario.clone() should raise a warning if platform==self.platform; the user might think they are cloning elsewhere, but have made a coding error (see also #101, #109).

Disagree - cloning within a platform is the most common use case.

That's true. However, if cloning within a platform, I imagine the user will not explicit supply the platform argument; but conversely if the argument is given explicitly, that indicates an attempt to do a cross-platform clone. So the warning could be given only in the latter case:

if platform is None:
    platform = self.platform
elif platform == self.platform:
    warn('clone destination platform=... is the same as source platform')

iiasa / ixmp