Closed alixdamman closed 3 years ago
@gdementen OK to "try" to implement this for the next release (0.33)? I said "try".
I had in mind to do a quicker-than-usual release to fix the issues we found during the training (matplotlib import in the viewer, small tutorial/doc changes and the load data >2d without axes names problem), probably as 0.32.1 and then move on to liam2. So I am basically ok with the 0.33 milestone but that will depend when you want to release it.
If we plan to do a 0.32.1 release, I think there is no reason to make 0.33 release soon.
Or... If you want to work on it before I am able to (probably somewhere in January), I give you the experimental code I have so far and you do it. So far, I have always linked this "static session" thing to the lazy session feature (see #727), but maybe we could dissociate the two. In that case it's pretty easy. The lazy part is a lot more tricky (when I say tricky, it's not necessarily hard to implement, but rather hard to figure out which behaviour is best for users).
Related to issue #727
See comment https://github.com/larray-project/larray/pull/840#issuecomment-567452278 for an example of how the StaticSession
class should be used.
This is at least how I saw things. There are probably other options. I just don't think restricting values based on values defined at the class level is a good idea. It is just too weird in the Python world, while there are alternatives which are more usual (restrict on types defined at the class level) or restrict on values defined at the instance level (i.e. values defined in __init__
)
In the Demo model, they have a Parameters
class in which they define all axes, groups and scalars of the model. The other classes contains only arrays but inherits from the Session
class to mainly take benefit of its I/O methods. I tried to somewhat "reproduce" what is done in that model (consider all axes and groups defined in a class as frozen) plus to include the idea of ArrayDef
but I see now that implementing a class just based on one model is wrong.
The only thing that bothered me with the proposed "specifications" of the StaticSession
is that all axes used in statements like array_var = ArrayDef(axes)
must be defined before the definition of the class and they cannot depend on any variable with value defined at runtime --> imagine a TIME axis defined as TIME = Axis(f'time={first_proj_year}..{last_proj_year}')
where first_proj_year
and last_proj_year
are defined at runtime.
I wonder if there an easy way to also "freeze" the axes of an Array variables defined in the __init__
? method?
We can discuss this when I get back in January, if you want.
On Fri, Dec 20, 2019 at 2:41 PM Alix Damman notifications@github.com wrote:
In the Demo model, they have a Parameters class in which they define all axes, groups and scalars of the model. The other classes contains only arrays but inherits from the Session class to mainly take benefit of its I/O methods. I tried to somewhat "reproduce" what is done in that model (consider all axes and groups defined in a class as frozen) plus to include the idea of ArrayDef but I see now that implementing a class just based on one model is wrong.
The only thing that bothered me with the proposed "specifications" of the StaticSession is that all axes used in statements like array_var = ArrayDef(axes) must be defined before the definition of the class and they cannot depend on any variable with value defined at runtime --> imagine a TIME axis defined as TIME = Axis(f'time={first_proj_year}..{last_proj_year}') where first_proj_year and last_proj_year are defined at runtime.
I wonder if there an easy way to also "freeze" the axes of an Array variables defined in the init ? method?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/larray-project/larray/issues/832?email_source=notifications&email_token=AAGLEXHOOFIPWSYKAPMG2Z3QZTDRVA5CNFSM4JS46LZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEHM6K2Q#issuecomment-567928170, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAGLEXAQJQKHHYU5LO7HGM3QZTDRVANCNFSM4JS46LZA .
I wonder if we couldn't try to use the pydantic
to fix this issue ?
This may help us to make classes inheriting from ConstrainedSession more compatible with the broader Python ecosystem. FWIW, a PyCharm plugin exists for pydantic.
@gdementen
I already have some proof of concept code for the implementation of the ConstrainedSession
class using pydantic.BaseModel
(see below):
TO_SET_LATER = object()
try:
import pydantic
except ImportError:
pydantic = None
if pydantic and sys.version_info >= (3, 6):
class ConstantAxes(ABCArray):
def __init__(self, axes=None):
if axes is not None and not isinstance(axes, AxisCollection):
axes = AxisCollection(axes)
self.axes = axes
class ConstrainedSession(Session, pydantic.BaseModel):
"""
Warnings
--------
The :py:method:`ConstrainedSession.filter`, :py:method:`ConstrainedSession.compact`
and :py:method:`ConstrainedSession.apply` methods return a simple Session object.
The type of the declared variables (and the value for the declared constants) will
no longer by checked.
Examples
--------
>>> class ModelVariables(ConstrainedSession):
... # declare variables with default initialization value.
... FIRST_OBS_YEAR: int = 1991
... LAST_PROJ_YEAR: int = 2070
... AGE: Axis = Axis('age=0..120')
... GENDER: Axis = Axis('gender=male,female')
... G_CHILDREN: Group = AGE[:17]
... G_ADULTS: Group = AGE[18:]
... # declare variables without default value
... # Such variables MUST be initialized at instantiation
... # as argument to the constructor or in the __post_init__ method
... variant_name: str
... first_proj_year: int
... obs_years: Axis
... proj_years: Axis
... population_obs: Array
... population_proj: Array
... # declare arrays with constant axes
... mortality_rate: Array = ConstantAxes((AGE, GENDER))
... # declare variables with a value to be defined after instantiation
... # TO BE USED WITH EXTREME CAUTION
... later: str = TO_SET_LATER
...
... # method to initialize variables defined at runtime
... def __post_init__(self):
... self.obs_years = Axis(f'time={self.FIRST_OBS_YEAR}..{self.first_proj_year-1}')
... self.proj_years = Axis(f'time={self.first_proj_year}..{self.LAST_PROJ_YEAR}')
... self.population_obs = zeros((self.AGE, self.GENDER, self.obs_years))
... self.population_proj = zeros((self.AGE, self.GENDER, self.proj_years))
... self.mortality_rate = zeros((self.AGE, self.GENDER))
>>> def run_model(variant_name, first_proj_year):
... # create an instance of the ModelVariables class
... m = ModelVariables(variant_name=variant_name, first_proj_year=first_proj_year)
... m.later = 'later'
... # set arrays
... # ==== model ====
... # some code here
... # ...
... # ==== output ====
... # save all variables in an HDF5 file
... m.save(f"{variant_name}.h5", display=True)
Content of file 'main.py'
>>> run_model('projection_2020_2070', first_proj_year=2020)
dumping FIRST_OBS_YEAR ... done
dumping LAST_PROJ_YEAR ... done
dumping AGE ... done
dumping GENDER ... done
dumping G_CHILDREN ... done
dumping G_ADULTS ... done
dumping variant_name ... done
dumping first_proj_year ... done
dumping obs_years ... done
dumping proj_years ... done
dumping population_obs ... done
dumping population_proj ... done
dumping mortality_rate ... done
"""
class Config:
# whether to validate field defaults (default: False)
validate_all = True
# whether to perform validation on assignment to attributes
validate_assignment = True
# a dict used to override the default error message templates.
# Pass in a dictionary with keys matching the error messages you want to override.
error_msg_templates = {}
def __init__(self, *args, **kwargs):
meta = kwargs.pop('meta', Metadata())
# whether to ignore, allow, or forbid extra attributes during model initialization (and after).
# Accepts the string values of 'ignore', 'allow', or 'forbid',
# or values of the Extra enum (default in pydantic: Extra.ignore)
self.__config__.extra = kwargs.pop('extra', 'allow')
# whether or not models are faux-immutable, i.e. whether __setattr__ is allowed.
# (default in pydantic: True)
self.__config__.allow_mutation = kwargs.pop('allow_mutation', True)
# need to call Session.__init__() to not fall into an infinite loop later
Session.__init__(self, meta=meta)
if len(args) == 1:
assert len(kwargs) == 0
a0 = args[0]
if isinstance(a0, str):
# assume a0 is a filename
self.load(a0)
else:
# iterable of tuple or dict-like
self.update(a0)
else:
self.add(*args, **kwargs)
pydantic.BaseModel.__init__(self, **self.items())
self.__post_init__(**kwargs)
def __post_init__(self, **kwargs):
"""Override this method to initialize variables for which values are known only at runtime."""
pass
def load(self, fname, names=None, engine='auto', display=False, **kwargs):
# find a way to skip constant and save computation time?
super().load(fname, names, engine, display, **kwargs)
# prefix = f"A value for a variable named '{key}' was found in file '{loaded_from_file}' while "
# warnings.warn(f"{prefix}'{key}' is not declared in '{self.__class__.__name__}'", stacklevel=2)
def save(self, fname, names=None, engine='auto', overwrite=True, display=False, **kwargs):
for key, value in self.items():
if value is TO_SET_LATER:
warnings.warn(f"The variable '{key}' is declared in the '{self.__class__.__name__}' "
f"class definition but was not set.")
super().save(fname, names, engine, overwrite, display, **kwargs)
def __setitem__(self, key, value, skip_check_value=False, loaded_from_file=None):
# we need to keep the attribute in sync (initially to mask the class attribute)
pydantic.BaseModel.__setattr__(self, key, value)
self._objects[key] = value
def __setattr__(self, key, value, skip_check_value=False, loaded_from_file=None):
# update the real attribute
pydantic.BaseModel.__setattr__(self, key, value)
# update self._objects
Session.__setattr__(self, key, value)
@pydantic.validator('*')
def validate_variables(cls, v, field):
key = field.name
# --- array with constant axes
if key in cls.__field_defaults__ and isinstance(field.value, ConstantAxes):
try:
field.value.axes.check_compatible(v.axes)
except ValueError as error:
msg = str(error).replace("incompatible axes:", f"incompatible axes for array '{key}':")\
.replace("vs", "was declared as")
raise ValueError(msg)
return v
@gdementen I already have some proof of concept code for the implementation of the
ConstrainedSession
class usingpydantic.BaseModel
(see below):
There are a few things I don't like with this approach...
TO_SET_LATER = object() try: import pydantic except ImportError: pydantic = None
Either we really depend on it (make it non-optional) or we allow something like:
class MySession(ConstrainedSession, pydantic.BaseMolde):
...
but I would rather not make ConstrainedSession rely on an optional dependency.
>>> class ModelVariables(ConstrainedSession): ... # declare variables with default initialization value. ... FIRST_OBS_YEAR: int = 1991
Can't we avoid specifying int in this case? It feels redundant.
... LAST_PROJ_YEAR: int = 2070 ... AGE: Axis = Axis('age=0..120') ... # declare variables without default value ... # Such variables MUST be initialized at instantiation ... # as argument to the constructor or in the __post_init__ method ... first_proj_year: int ... obs_years: Axis ... population_proj: Array ... # declare arrays with constant axes ... mortality_rate: Array = ConstantAxes((AGE, GENDER))
This is the main point where I think the approach fails. pydantic is very similar to the typing module which is itself not flexible enough for us yet (see my comment at https://github.com/larray-project/larray/pull/840#discussion_r364230103), in that you can provide types or constraints to be validated (a type is just a particular kind of constraint). The thing here is that you provide constraints sometimes via the type (using :
) and sometimes via the value (after the =
). This is too confusing IMO. I think we should either provide all constraints via the value, or via the type, but not a mix of both.
... # declare variables with a value to be defined after instantiation ... # TO BE USED WITH EXTREME CAUTION ... later: str = TO_SET_LATER
Hu? I don't understand the difference between this and the variable with only a type like above (variant_name, etc.)??
... ... # method to initialize variables defined at runtime ... def __post_init__(self):
Having to define yet an extra method? Isn't this getting too complex for our users?
There are a few things I don't like with this approach...
No kidding!
I would rather not make ConstrainedSession rely on an optional dependency
OK. But I wanted to make something similar to open_excel
returning an error if xlwings
is not installed. I don't see why xlwings
is optional (and then open_excel
raises an error) but pydantic
should not be optional.
Anyway, I proposed to use pydantic
because I took a look recently of the code of pandasSDMX
which relies on pydantic
to implement SDMX concepts. I guess I will use either pandasSDMX
(at least the model.py
module of it) or pydantic
for the SDMX project.
Can't we avoid specifying int in this case? It feels redundant.
No, pydantic
propose something similar to @dataclass
via its ModelBase
class. pydantic
also propose its own version of @dataclass
decorator (see here). pydantic
is supposed to work well with mypy
(see here).
This is the main point where I think the approach fails. pydantic is very similar to the typing module which is itself not flexible enough for us yet (see my comment at #840 (comment)), in that you can provide types or constraints to be validated (a type is just a particular kind of constraint). The thing here is that you provide constraints sometimes via the type (using :) and sometimes via the value (after the =). This is too confusing IMO. I think we should either provide all constraints via the value, or via the type, but not a mix of both.
In the DEMO model, I encouraged user to use annotations in their functions especially to allow auto-completion on methods and attributes of passed arguments. Within PyCharm (with Python 3.6+), writing:
def foo(arr: Array)
arr.
will end up with a dropdown box with all methods and attributes of Array objects. My idea here is to encourage users in general to use annotations everywhere. Was that a bad idea ?
With the string syntax
of LArray we already allow users to write code less compatible with the broader Python ecosystem. If we choose the =
operator to specify types (or pseudo-types), we move further away from broader Python ecosystem. Besides, using the :
in functions and =
in constrained classes will be even more confusing.
... # declare variables with a value to be defined after instantiation ... # TO BE USED WITH EXTREME CAUTION ... later: str = TO_SET_LATER
Hu? I don't understand the difference between this and the variable with only a type like above (variant_name, etc.)??
pydantic
raises an error if not all declared attributes are set with a value. After instantiation, an object is supposed to be fully set up with pydantic
. TO_SET_LATER
is just a trick to bypass this behavior.
... # method to initialize variables defined at runtime ... def __post_init__(self):
Having to define yet an extra method? Isn't this getting too complex for our users?
__post_init__
comes from the Python 3.7 Dataclass actually. The Dataclass has a __post_init__
method called at the end of the generated __init__
by default (see here).
I would rather not make ConstrainedSession rely on an optional dependency
OK. But I wanted to make something similar to
open_excel
returning an error ifxlwings
is not installed. I don't see whyxlwings
is optional (and thenopen_excel
raises an error) butpydantic
should not be optional.Anyway, I proposed to use
pydantic
because I took a look recently of the code ofpandasSDMX
which relies onpydantic
to implement SDMX concepts. I guess I will use eitherpandasSDMX
(at least themodel.py
module of it) orpydantic
for the SDMX project.
The PandaSDMX (there is no second s) API is awful. I just hope you will not mimic it.
Can't we avoid specifying int in this case? It feels redundant.
No,
pydantic
propose something similar to@dataclass
via itsModelBase
class.pydantic
also propose its own version of@dataclass
decorator (see here).pydantic
is supposed to work well withmypy
(see here).
From pydantic front page example:
class User(BaseModel):
id: int
name = 'John Doe'
then:
name is inferred as a string from the provided default; because it has a default, it is not required.
So I assume it is possible. If it isn't in our case, please explain why.
This is the main point where I think the approach fails. pydantic is very similar to the typing module which is itself not flexible enough for us yet (see my comment at #840 (comment)), in that you can provide types or constraints to be validated (a type is just a particular kind of constraint). The thing here is that you provide constraints sometimes via the type (using :) and sometimes via the value (after the =). This is too confusing IMO. I think we should either provide all constraints via the value, or via the type, but not a mix of both.
In the DEMO model, I encouraged user to use annotations in their functions especially to allow auto-completion on methods and attributes of passed arguments. Within PyCharm (with Python 3.6+), writing: [...] Was that a bad idea ?
I never said a word against that... But isn't it possible to define a custom data type (https://pydantic-docs.helpmanual.io/usage/types/#custom-data-types ) instead of using ConstantAxes, so that we can provide the constraint via a type (:
), instead of via =
?
With the
string syntax
of LArray we already allow users to write code less compatible with the broader Python ecosystem. If we choose the=
operator to specify types (or pseudo-types), we move further away from broader Python ecosystem. Besides, using the:
in functions and=
in constrained classes will be even more confusing.
That's why I have always pushed against using it outside of an interactive console.
... # declare variables with a value to be defined after instantiation ... # TO BE USED WITH EXTREME CAUTION ... later: str = TO_SET_LATER
Hu? I don't understand the difference between this and the variable with only a type like above (variant_name, etc.)??
pydantic
raises an error if not all declared attributes are set with a value. After instantiation, an object is supposed to be fully set up withpydantic
.TO_SET_LATER
is just a trick to bypass this behavior.
Could you explain why a user would need to set values after instantiation?
Having to define yet an extra method? Isn't this getting too complex for our users?
__post_init__
comes from the Python 3.7 Dataclass actually. The Dataclass has a__post_init__
method called at the end of the generated__init__
by default (see here).
This does not change anything to my statement. I would rather not explain that to our users if we can avoid it.
From pydantic front page example:
class User(BaseModel): id: int name = 'John Doe'
then:
name is inferred as a string from the provided default; because it has a default, it is not required.
So I assume it is.
Yes it is. You're totally right. I completely miss that.
I never said a word against that... But isn't it possible to define a custom data type (https://pydantic-docs.helpmanual.io/usage/types/#custom-data-types ) instead of using ConstantAxes, so that we can provide the constraint via a type (:), instead of via =?
I made a quick test with pydantic during the week-end. I didn't went so far in the documentation of pydantic. I will take a look.
pydantic raises an error if not all declared attributes are set with a value. After instantiation, an object is supposed to be fully set up with pydantic. TO_SET_LATER is just a trick to bypass this behavior.
Could you explain why we need that?
pydantic requires to set all fields at instantiation. This means our users will have to define all fields at instantiation. This maybe a bit too restrictive for our users. Besides, I don't know what would happen if a user class is loaded from several files. I just renamed NOT_LOADED by TO_SET_LATER.
This does not change anything to my statement. I would rather not explain that to our users if we can avoid it.
In that case, we need to define the TO_SET_LATER (or NOT_LOADED) value.
Although, I can understand that we want to make the life of our users easier. I see LArray as something somewhere between a new "modelling language" and a Python library. But this requires for us to make much more effort to develop LArray than to develop any true Python lib (all the work you did for the string syntax was amazing). I would like LArray to be more used internally but that's another discussion.
Maybe the risk of adding pydantic
as a new dependency is too high if we just use it for the ConstrainedSession (I am thinking of the "Dependency Hell"). It is OK if you think it is.
Even so, I would to see if there is possibility to use the :
notation instead of =
notation because I would like to encourage users (those having Python 3.6+ installed) to use annotations (in functions and classes).
I never said a word against that... But isn't it possible to define a custom data type (https://pydantic-docs.helpmanual.io/usage/types/#custom-data-types ) instead of using ConstantAxes, so that we can provide the constraint via a type (:), instead of via =?
I made a quick test with pydantic during the week-end. I didn't went so far in the documentation of pydantic. I will take a look.
For me this question is crucial. If pydantic allows us to do "the right thing" (i.e. annotate classes -- and possible other code -- with arrays with defined axes) it is worth adding it as a dependency. If it does not, I am unsure it is worth the added trouble.
pydantic raises an error if not all declared attributes are set with a value. After instantiation, an object is supposed to be fully set up with pydantic. TO_SET_LATER is just a trick to bypass this behavior.
Could you explain why we need that?
pydantic requires to set all fields at instantiation. This means our users will have to define all fields at instantiation. This maybe a bit too restrictive for our users. I don't think so, as long as the behavior is limited to ConstrainedSession, not to all sessions.
Besides, I don't know what would happen if a user class is loaded from several files.
That's indeed going to be problematic but I think that in that case we would need to provide a solution, not our users. Eg. provide a flag when loading such a file to tell it is partial/incomplete and at that point set all other fields to TO_SET_LATER_OR_WHATEVER_WE_CALL_IT 😉.
This does not change anything to my statement. I would rather not explain that to our users if we can avoid it.
In that case, we need to define the TO_SET_LATER (or NOT_LOADED) value.
Although, I can understand that we want to make the life of our users easier. I see LArray as something somewhere between a new "modelling language" and a Python library. But this requires for us to make much more effort to develop LArray than to develop any true Python lib (all the work you did for the string syntax was amazing). I would like LArray to be more used internally but that's another discussion.
Maybe the risk of adding
pydantic
as a new dependency is too high if we just use it for the ConstrainedSession (I am thinking of the "Dependency Hell"). It is OK if you think it is. Even so, I would to see if there is possibility to use the:
notation instead of=
notation because I would like to encourage users (those having Python 3.6+ installed) to use annotations (in functions and classes).
A new dependency is always a tradeoff. If we can add a custom type which can validate axes AND work in PyCharm et al. that would be awesome and more than worth it. Otherwise, I am unsure it would be worth it.
Alright.
Either we choose to use pydantic
or not, I have a few more suggestions and questions:
Suggestions:
allow_extra_vars
to whether or not allow extra variables during and after model initialization. Defaults to True.allow_mutation
to whether or not tell the constrained class is faux-immutable, i.e. whether setattr and setitem are allowed. Defaults to True. This argument could be interesting to create "Parameters" class such as the one defined in the DEMO model.Questions:
filter()
, compact()
and apply()
methods of ConstrainedSession
will return a normal Session
object. Is putting a Warning section in the doctsring of ConstrainedSession enough or should I print a warning when one of these methods is called. I guess a Warning section is enough but do you think our users will read that section? Session
object silentlySession
object and print a warningConstrainedSession
object and raise an error in case of any incompatibility (extra arrays, ...)ConstrainedSession
object if possible. If not, return a normal Session
object and print a warning.Personally, I would go for option d) or b) for the questions 2 and 3. I think binary operations is rarely used by our users but IMO returning a normal Session
object should not be done silently in this case.
For me this question is crucial. If pydantic allows us to do "the right thing" (i.e. annotate classes -- and possible other code -- with arrays with defined axes) it is worth adding it as a dependency. If it does not, I am unsure it is worth the added trouble.
What you are saying is that you want something like:
class MyDemo(ConstrainedSession):
pop: ArrayDef("age=0..120; gender=female,male; time=1991..2020")
?
It sounds like it is not possible with the typing
module nor with pydantic
.
But:
ArrayDef
. This class has only an __init__()
method taking axes as argument. So when instantiating MyDemo
class as demo = MyClass(pop=zeros("age=0..120; gender=female,male; time=1991..2020"))
, writing pop.
later will not popup the methods of the Array
. This is because PyCharm thinks that pop
is still an ArrayDef object. This is not an argument to not use the var = type
notation but the problem is still there and needs to be fixed one way or another.=
is commonly used to define a default value and :
to define a type. That is what is used in functions (and traditional dataclasses) and that is what PyCharm relies on to propose auto-completion. So, I would prefer if we if we stick to that (after all). So, what we may do is to introduce a subclass ArrayDef
implemented as something like:
class ArrayDef(Array):
def __inti__(data=None, axes=None, dtype=None, meta=None):
if axes is not None and not isinstance(axes, AxisCollection):
axes = AxisCollection(axes)
if data is None:
data = np.empty(axes.shape, dtype)
Array.__init__(self, data, axes, dtype, meta)
and then:
from pydantic import BaseModel
class ConstrainedSession(Session, BaseModel):
...
@pydantic.validator('*')
def validate_variables(cls, v, field):
key = field.name
# --- array with constant axes
if isinstance(field.type_, ArrayDef):
# no axes given in class definition
if not field.value:
# first initialization
if key is not in self.keys()
return v
else:
# compare with current value
try:
self[key].axes.check_compatible(v.axes)
except ValueError as error:
msg = str(error).replace("incompatible axes:", f"incompatible axes for array '{key}':")\
.replace("vs", "was declared as")
raise ValueError(msg)
return v
# axes given in class definition
else:
try:
field.value.axes.check_compatible(v.axes)
except ValueError as error:
msg = str(error).replace("incompatible axes:", f"incompatible axes for array '{key}':")\
.replace("vs", "was declared as")
raise ValueError(msg)
return v
and then:
class MyDemo(ConstrainedSession):
# axes given in class definition
pop = ArrayDef("age=0..120; gender=female,male; time=1991..2020")
# axes given at instantiation
births: ArrayDef
demo = MyDemo(pop=zeros("age=0..120; gender=female,male; time=1991..2020"),
births=zeros("age=0..120; gender=female,male; time=1991..2020"))
For me this question is crucial. If pydantic allows us to do "the right thing" (i.e. annotate classes -- and possible other code -- with arrays with defined axes) it is worth adding it as a dependency. If it does not, I am unsure it is worth the added trouble.
What you are saying is that you want something like:
class MyDemo(ConstrainedSession): pop: ArrayDef("age=0..120; gender=female,male; time=1991..2020")
Indeed. Or, in fact, I would even prefer this (the spelling does not matter that much but the structure does):
class MyDemo(ConstrainedSession):
gender = Axis("gender=female,male")
time: Axis
pop: ArrayDef([gender, time])
def init_demo():
time = Axis('time=1991..2020')
return MyDemo(time=time, pop=zeros([MyDemo.gender, time])
It sounds like it is not possible with the
typing
module nor withpydantic
.
I knew about typing but I had hopes for pydantic... Too bad.
But:
1. There is a little problem with the current implementation of `ArrayDef`. This class has only an `__init__()` method taking axes as argument. So when instantiating `MyDemo` class as `demo = MyClass(pop=zeros("age=0..120; gender=female,male; time=1991..2020"))`, writing `pop.` later will not popup the methods of the `Array`. This is because PyCharm thinks that `pop` is still an ArrayDef object. This is not an argument to not use the `var = type` notation but the problem is still there and needs to be fixed one way or another. 2. The operator `=` is commonly used to define a default and `:` to define a type. That is what is used in functions (and traditional dataclasses) and that is what PyCharm relies on to propose auto-completion. So, I would prefer if we if we stick to that (after all).
Indeed. And I would prefer that too (that's the whole point of my comments), but then what you show do not respect that, so I am confused 😕 ...
So, what we may do is to introduce a subclass
ArrayDef
implemented as something like:class ArrayDef(Array): def __inti__(data=None, axes=None, dtype=None, meta=None): if axes is not None and not isinstance(axes, AxisCollection): axes = AxisCollection(axes) if data is None: data = np.empty(axes.shape, dtype)
Making ArrayDef a subclass of Array should be enough. You do not actually need any data. To trick PyCharm/static analyzers, we do not need the object to be functional. We will not use them anyway.
Array.__init__(self, data, axes, dtype, meta)
This is not needed either AFAICT.
and then:
class MyDemo(ConstrainedSession): # axes given in class definition pop = ArrayDef("age=0..120; gender=female,male; time=1991..2020")
Here you use =
for a type => I am confused...
Alright.
Either we choose to use
pydantic
or not, I have a few more suggestions and questions:Suggestions:
1. I would to add a boolean keyword argument `allow_extra_vars` to whether or not allow extra variables during and after model initialization. Defaults to True.
I fear an option nobody will use. The option could come later if we have demand for it.
2. I would to add a boolean keyword argument `allow_mutation` to whether or not tell the constrained class is faux-immutable, i.e. whether **setattr** and **setitem** are allowed. Defaults to True. This argument could be interesting to create "Parameters" class such as the one defined in the DEMO model.
I fear an option nobody will use. The option could come later if we have demand for it OR we could provide a built-in subclass (e.g. Parameters) with that option enabled.
Questions:
1. The `filter()`, `compact()` and `apply()` methods of `ConstrainedSession` will return a normal `Session` object. Is putting a Warning section in the doctsring of ConstrainedSession enough or should I print a warning when one of these methods is called. I guess a Warning section is enough but do you think our users will read that section?
They will not read that section, but a dynamic warning would be annoying, so I would only warn in the doc and wait for users to discover it (or we might show one such example of Constrainted-> normal session in the tutorial section about ConstraintedSession)
2. What to do with binary operations (**eq** and **neq** excluded)? * a) Always return a normal `Session` object silently * b) Always return a normal `Session` object and print a warning * c) Always return a `ConstrainedSession` object and raise an error in case of any incompatibility (extra arrays, ...) * d) Return a `ConstrainedSession` object if possible. If not, return a normal `Session` object and print a warning.
I don't really know. I don't like option c) because this severely limits what you can do, but I am unsure about the other options.
3. Same question for unary operations
Personally, I would go for option d) or b) for the questions 2 and 3. I think binary operations is rarely used by our users but IMO returning a normal
Session
object should not be done silently in this case.
The PandaSDMX (there is no second s) API is awful. I just hope you will not mimic it.
No, that's not what I have in mind. I only wanted to be able to import the model.py
script from pandasSDMX
(which implements the so-called SDMX Information-Model (https://sdmx.org/?page_id=5008 > SDMX 2.1 > Section 2 - Information Model)) but after a second thought I think this is silly to add a dependency just for one script. I will write my own implementation of the SDMX-IM but also using pydantic
because it has an added value in this case.
I have started to read the official technical documentation of the SDMX-IM. The document is messy. There discrepancies between the class diagrams and the definitions sections. Testing the code is gonna be fun...
Suggestions:
- I would to add a boolean keyword argument
allow_extra_vars
to whether or not allow extra variables during and after model initialization. Defaults to True.I fear an option nobody will use. The option could come later if we have demand for it.
- I would to add a boolean keyword argument
allow_mutation
to whether or not tell the constrained class is faux-immutable, i.e. whether setattr and setitem are allowed. Defaults to True. This argument could be interesting to create "Parameters" class such as the one defined in the DEMO model.I fear an option nobody will use. The option could come later if we have demand for it OR we could provide a built-in subclass (e.g. Parameters) with that option enabled.
Providing a built-in subclass Parameters
sounds a good idea to me. In this case, Parameters
should simulate a faux-immutable class and also forbid to add undeclared variables (and to delete declared variables) IMO.
1. The `filter()`, `compact()` and `apply()` methods of `ConstrainedSession` will return a normal `Session` object. Is putting a Warning section in the doctsring of ConstrainedSession enough or should I print a warning when one of these methods is called. I guess a Warning section is enough but do you think our users will read that section?
They will not read that section, but a dynamic warning would be annoying, so I would only warn in the doc and wait for users to discover it (or we might show one such example of Constrainted-> normal session in the tutorial section about ConstraintedSession)
OK
What to do with binary operations (eq and neq excluded)?
- a) Always return a normal
Session
object silently- b) Always return a normal
Session
object and print a warning- c) Always return a
ConstrainedSession
object and raise an error in case of any incompatibility (extra arrays, ...)- d) Return a
ConstrainedSession
object if possible. If not, return a normalSession
object and print a warning.I don't really know. I don't like option c) because this severely limits what you can do, but I am unsure about the other options.
OK to exclude option c) which is indeed too restrictive. I'll go for option d) (but without warnings?).
- What to do with binary operations (eq and neq excluded)?
* a) Always return a normal `Session` object silently * b) Always return a normal `Session` object and print a warning * c) Always return a `ConstrainedSession` object and raise an error in case of any incompatibility (extra arrays, ...) * d) Return a `ConstrainedSession` object if possible. If not, return a normal `Session` object and print a warning.
I don't really know. I don't like option c) because this severely limits what you can do, but I am unsure about the other options.
OK to exclude option c) which is indeed too restrictive. I'll go for option d) (but without warnings?).
In fact, even option c might make sense as long as we have both 1) a way to easily convert a ConstrainedSession to a normal Session and 2) we support binary ops between a ConstrainedSession and a normal Session without error even if they differ (either returning a ConstrainedSession or Session).
In other words, one more option (I am not saying it is the best option -- I don't know) is to behave differently depending on if we are doing ConstrainedSession + Session or ConstrainedSession + ConstrainedSession. We could be more restrictive in the second case.
In short, I still don't know what to do... Sorry, I really don't have a clear mind these days... I guess you will have to pick one an see how it goes in practice. I usually don't like working like that but...
In fact, even option c might make sense as long as we have both 1) a way to easily convert a ConstrainedSession to a normal Session and 2) we support binary ops between a ConstrainedSession and a normal Session without error even if they differ (either returning a ConstrainedSession or Session).
In other words, one more option (I am not saying it is the best option -- I don't know) is to behave differently depending on if we are doing ConstrainedSession + Session or ConstrainedSession + ConstrainedSession. We could be more restrictive in the second case.
In short, I still don't know what to do... Sorry, I really don't have a clear mind these days... I guess you will have to pick one an see how it goes in practice. I usually don't like working like that but...
What I have implemented now (but didn't push yet) is:
def opmethod(self, other):
(...)
res.append((name, res_item))
try:
# XXX: print a warning?
ses = self.__class__(res)
except Exception:
ses = Session(res)
return ses
in both _bynaryop and _unaryop.
I really don't have a clear mind these days...
Me neither...
What I have implemented now (but didn't push yet) is:
def opmethod(self, other): (...) res.append((name, res_item)) try: # XXX: print a warning? ses = self.__class__(res) except Exception: ses = Session(res) return ses
in both _bynaryop and _unaryop.
Seems good enough for now. We can always refine this later.
Or, in fact, I would even prefer this (the spelling does not matter that much but the structure does):
class MyDemo(ConstrainedSession):
gender = Axis("gender=female,male")
time: Axis
pop: ArrayDef([gender, time])
def init_demo():
time = Axis('time=1991..2020')
return MyDemo(time=time, pop=zeros([MyDemo.gender, time])
It sounds like it is not possible with the typing module nor with pydantic.
I knew about typing but I had hopes for pydantic... Too bad.
I was speaking too fast. I finally found some time to take a deeper look inside the pydantic library. I made some quick tests but it seems like that implementing:
AGE = Axis("age=0..120")
GENDER = Axis("gender=female,male")
class MyDemo(ConstrainedSession):
pop: ArrayDef((AGE, GENDER))
def init_demo():
return MyDemo(pop=zeros((AGE, GENDER))
is actually possible with pydantic.
Maybe implementing something like pop: ArrayDef((GENDER, 'time'))
where the axis time
is defined at runtime is possible.
🎉Yipie!🎉 Now pydantic looks a lot more interesting to me. How does PyCharm behave with such annotations?
How does PyCharm behave with such annotations?
Quiet well actually but I have to say that I recently installed the 2020.1 version of PyCharm.
The main idea of the present issue is to benefit from both auto-completion of PyCharm (by defining user classes) and all (nice) features offered by the Session class.
This implies to develop a mechanism to:
self._objects
attribute of Session.Another (big) advantage of using user-defined classes is to reduce significantly the number of global variables and function parameters in the models that are developed in house.