simple xarray-simlab code fail to set attrs attributes.

trying to run the following code:

import xsimlab as xs
import numpy as np

@xs.process
class TestProcess(object):
    spacing = xs.variable(description='grid spacing', default=2)

mod = xs.Model({"proc1": TestProcess})

fails with the following error from the attrs's generated init:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-8-600a28a8e410> in <module>()
      6     spacing = xs.variable(description='grid spacing', default=2)
      7 
----> 8 mod = xs.Model({"proc1": TestProcess})

~/.local/lib/python3.7/site-packages/xsimlab/model.py in __init__(self, processes)
    357             ensure_process_decorated(cls)
    358 
--> 359         builder = _ModelBuilder(processes)
    360 
    361         builder.bind_processes(self)

~/.local/lib/python3.7/site-packages/xsimlab/model.py in __init__(self, processes_cls)
     42     def __init__(self, processes_cls):
     43         self._processes_cls = processes_cls
---> 44         self._processes_obj = {k: cls() for k, cls in processes_cls.items()}
     45 
     46         self._reverse_lookup = {cls: k for k, cls in processes_cls.items()}

~/.local/lib/python3.7/site-packages/xsimlab/model.py in <dictcomp>(.0)
     42     def __init__(self, processes_cls):
     43         self._processes_cls = processes_cls
---> 44         self._processes_obj = {k: cls() for k, cls in processes_cls.items()}
     45 
     46         self._reverse_lookup = {cls: k for k, cls in processes_cls.items()}

<attrs generated init c3bfe1a9beca995748daa500465cd5be1c8546d3> in __init__(self)
      1 def __init__(self, ):
----> 2     self.spacing = attr_dict['spacing'].default
      3     self.__attrs_post_init__()

AttributeError: can't set attribute

I tested various tagged version of both xarray-simlab and attrs, any idea of what might be wrong?

BTW I am working on setting up a large scale simulation so I was considering using this package for initializing/keeping track of all the parameters, although the actual modelling in the time-domain will be executed by external code. Thus I'd like to use xarray-simlab mostly for the smart variable method and to create the experimental setup in discrete steps. Is there any problem I could find in only calling the model.initialize() method ? or e.g not passing clocks to the builder?

thank you very much. this seems a very useful project. Hope I could contribute anything. luca p

brief update after some research, maybe it might help. It seems the following code:

import xsimlab as xs
from xsimlab.variable import _as_dim_tuple, VarIntent, VarType
import attr

def variable(dims=(), intent='in', group=None, default=attr.NOTHING,
             validator=None, description='', attrs=None):

    metadata = {
        'var_type': VarType.VARIABLE,
                'dims': _as_dim_tuple(dims),
                'intent': VarIntent(intent),
                'group': group,
                'attrs': attrs or {},
                'description': description}

    return attr.attrib(metadata=metadata, default=default, validator=validator,
                       init=False, cmp=False, repr=False)

import attr
@xs.process
class test(object):
    spacing = variable(description='grid spacing', default=2)

t = test()

does not work bu this does (only the key of the metadata dict is changed):

import xsimlab as xs
from xsimlab.variable import _as_dim_tuple, VarIntent, VarType
import attr

def variable(dims=(), intent='in', group=None, default=attr.NOTHING,
             validator=None, description='', attrs=None):

    metadata = {
        'var_type2': VarType.VARIABLE,
                'dims': _as_dim_tuple(dims),
                'intent': VarIntent(intent),
                'group': group,
                'attrs': attrs or {},
                'description': description}

    return attr.attrib(metadata=metadata, default=default, validator=validator,
                       init=False, cmp=False, repr=False)

import attr
@xs.process
class test(object):
    spacing = variable(description='grid spacing', default=2)

t = test()

Last comment was not very useful, but I've been trying to investigate the issue better. here is what I discovered (btw I am on current git version), but I'd really appreciate some devs input:

in my first comment I was trying to set a default value. this is causing an error during class initialization because the self.attrs_post_init(), which reads, from process.py:
```
def init_process(self):
    self.__xsimlab_model__ = None
    self.__xsimlab_name__ = None
    self.__xsimlab_store__ = None
    self.__xsimlab_store_keys__ = {}
    self.__xsimlab_od_keys__ = {}
```
is called at the end of the init of the class, thus after the init tries to set the default values. This means that the store and the store_keys does not exists yet (until attrs_post_init is called). this is clear from the error above:
```
----> 2     self.spacing = attr_dict['spacing'].default
  3     self.__attrs_post_init__()
```
the error happens at 2 because self.attrs_post_init() is still to be called.
the store and the store keys are used by the property responsible for managing the get/set of the variable value, from process.py [ln 250 ca.]

property(fget=get_from_store, fset=put_in_store, doc=var_doc)

and thus they need to exists before setting and getting the values. But when put_in_store is called (e.g. to set the default value) the self.__xsimlab_store and self.xsimlab_store_keys__ still does not exists.

when not using a default value still some problems persists because init_process, which is called at the end of init set up a None store as self.__xsimlab_store__. Thus not allowing to set/get any value from it.

A simple solution is to modify the post init method as:

def init_process(self):
    self.__xsimlab_model__ = None
    self.__xsimlab_name__ = None
    self.__xsimlab_store__ = {} # a default store 
    self.__xsimlab_store_keys__ = {}
    self.__xsimlab_od_keys__ = {}

this still does not solve the problem with the default values of the variables which are set when this method is still to be called.

I'd really appreciate some input from the developers... thanks luca

after a lot of trial and errors I ended up by these considerations:

default values support in variables is broken. in fact there is no test using a default value
I would suggest to move the default value to the metadata associated to the variable, so that can be used e.g. when updating the variables, in the case a value for that var is not provided.

Many thanks for this detailed report @luca-penasa ! And sorry for the delay.

default values support in variables is broken. in fact there is no test using a default value

Actually, support for default values is not yet fully implemented! I admit it wasn't a good idea to already add the default argument in xs.variable()! I didn't expect that this would cause issues like this, sorry! For now we should really leave default=attr.NOTHING.

I would suggest to move the default value to the metadata associated to the variable

Do you mean something like xs.variable(attrs={'default': 2})? The attrsargument is rather for holding user custom metadata. If we support default values, it should still be set using a reserved keyword argument IMO.

Or maybe you mean moving the default value to the metadata attribute of the underlying attr.attrib? Like this:

def variable(dims=(), intent='in', group=None, default=attr.NOTHING,
             validator=None, description='', attrs=None):

    metadata = {'default_value': default, ...}

    return attr.attrib(metadata=metadata, validator=validator,
                       init=False, cmp=False, repr=False)

Unless there is a way to handle attr.attrib(default) so that we can avoid the issue here, I think that this solution above is the best one.

xarray-simlab uses attrs in a rather uncommon way. For example, xs.process decorated classes turn all declared variables into properties, so is not always possible just reusing the features of attrs that are related to instance initialization.

But in this case, there no need to do something special with the default value in a process class __init__. Process classes are used here just as holders of some useful metadata about their declared parameters/variables. The logic of setting the default value of a variable, e.g., in a xarray.Dataset or in __xsimlab_store__ is implemented elsewhere and executed after a xs.Model object has been created (i.e., after Process classes have been instantiated).

BTW I am working on setting up a large scale simulation so I was considering using this package for initializing/keeping track of all the parameters, although the actual modelling in the time-domain will be executed by external code. Thus I'd like to use xarray-simlab mostly for the smart variable method and to create the experimental setup in discrete steps. Is there any problem I could find in only calling the model.initialize() method ? or e.g not passing clocks to the builder?

I don't see anything that will prevent you doing that, but I might be wrong. However, I think that you will still need to set a (dummy) clock with create_setup. It is required, but in your case it would just run an empty loop during the simulation.

Actually, I've thinking a bit about the possibility of setting custom execution workflows, i.e., not tied to the rigid initialize->clock*(run_step->finalize_step)->finalize. This would require a fair amount of work, though.

this seems a very useful project. Hope I could contribute anything.

This would be greatly appreciated! :-)

thank you @benbovy . Now I've got a little better the architecture... and can understand it was meant to work like that.

I am actually trying to replicate a similar structure for my project. One of the major deal breaking for me was the impossibility to deal with "nested" variables: a process defines a new variable in the form of a a complex object. that object might in turn contain attributes that should be treated as variables themselves. i.e. in psudopythoncode

@process
class Process1:
    var1 = variable (...) [instance of Var1]

with Var1, something like

@attr.attrs [or process]
class Var1:
    v1 = attr.ib [or variable](...)
    v2 = attr.ib [or variable](...)
    v3 = attr.ib [or variable](...)
    v4 = attr.ib [or variable](...)

in a different process i might need to access by reading or writing one of these nested variables. As it is structured now I think it is possible to only access the var1 object defined in the different process, like this:

@process
class Process2:
    var1_imported = foreign variable from Process1(...) [instance of Var1]

in this case I can import within Process2 var1 as a foreign variable. But then I cannot keep a record of what attribute is actually changes in var1 - e.g. v1? or v2? all of them?

would be more useful to be able to import nested variables. e.g. something like:

@process
class Process2:
    v1_imported = "Process1.var1.v1" as foreign variable from Process1(...)

depending on how I define the variable I can track whether the variable is read/write etc.

The main problem I see now is the structure of the store as a non-nested dict as it is implemented now. Thus I am not sure how I could improve xsimlab to deal with this.... maybe you thought about this? any idea?

My interest is basically in keeping track where a variable is set/updated/read in the model by which module, etc which is the same idea behind xsimlab as far as I got....

btw nice to see such a good earth-scientist pythonist! thank you very much for help

best, luca

Using flat collections of variables or processes was a design choice. Nested structures would greatly complicate the implementation and doesn't play well with the xarray data model (I initially started this project with the goal to use xarray objects as simulation I/O). For example, how to define intent for a complex object having multiple attributes (or a foreign variable linked to that complex object)?

Looking at your example, would it be a problem to have three process classes like below?

@xs.process
class Var1:
    v1 = xs.variable()
    v2 = xs.variable()
    v3 = xs.variable()
    v4 = xs.variable()

@xs.process
class Process1:
    ref_v1 = xs.foreign(Var1, 'v1')
    ref_v2 = xs.foreign(Var1, 'v2')
    ref_v3 = xs.foreign(Var1, 'v3')
    ref_v4 = xs.foreign(Var1, 'v4')

@xs.process
class Process2:
    ref_v1 = xs.foreign(Var1, 'v1')

You can actually create a process class as a "variable bundle", i.e., just for declaring some variables (no methods), like Var1 above. The downside is that if you want to use those variables in other process classes, you need to declare a foreign for each of them (like in Process1), which is a bit tedious. #37 solves this, but only partially (not possible to track which specific variable is read and shouldn't be used to set values).

btw nice to see such a good earth-scientist pythonist!

Thanks :-)

just to update you, after some thinking, what about using a nested variables scheme? like this:

@xs.process
class Process1:
    ref_v1 = xs.foreign(Var1, 'v1.v2.v3')

A super simplified working implementation here (I am not using a store like in xsimlab): https://gist.github.com/luca-penasa/89e3ad12ca80d597e0cf14a0a0cff06c

but i don't know how well it could work with the current implementation and how much work one should do to make this work.

variables imported with this nested scheme could just live where they are... in fact they would be yet in the store given the object containing them is. [I also don't know how xsimlab would react to storing objects in variables...]

btw I think the store is good but it would definitively better to leave that as an option. thanks for your help luca

What about something like:

@xs.process
class Process1:
    ref_v1 = xs.foreign(Var1, 'v1')
    ref_v2 = xs.foreign(Var1, 'v2')
    ref_v3 = xs.foreign(Var1, 'v3')

Do I miss any real advantage of using a nested scheme over explicit declaration of each attribute (apart from maybe being a little bit more succinct)?

I'm quite reluctant to add support for nested variables for several reasons:

This would add too much complexity to the (already quite complex) current implementation in many places, which is not worth of the benefits IMHO.
We would need to extend API support for many things, e.g., xs.foreign(Var1, ('v1', 'v2', 'v3'), intent={'v1': 'in', 'v2': 'in', 'v3': 'out'}). For xs.variable the same kind of dictionary should be needed for specifying dims or later default values, validators, etc. At the end the code is not really shorter, it's even less readable.
I'm slightly concerned by the overhead this would add when getting/setting the values of the variables in process classes through their properties.
Nested variables don't play well with the xarray data model. The current one-to-one relationship between axs.variable and a xarray.Variable is easy to deal with. Maintaining both one-to-one and one-to-many relationships in the case of nested variables is hard to maintain. We could argue the same for other kinds of automated I/O interfaces that we could later support, like command-line (options or input file entries) or UI (widgets). Storing complex objects (i.e., other than arrays or primitive types) as xs.variable values wouldn't play well with those I/O interfaces either.

Without taking statements from "The Zen of Python" too religiously, I think that the "flat is better than nested" principle rather applies well here.

btw I think the store is good but it would definitively better to leave that as an option.

Storing values of process variables directly as class attributes would certainly avoid issues like the one you've reported here. However, properties have also another advantage than using a centralized, abstract store for simulation data: it allows some safe guards, e.g., add read-only properties for variables declared with intent='in'.

thank you for your reply @benbovy, btw mine was just a way of discussing about this with somebody who thought a lot about the problem. I am facing the issue of implementing something very similar. So I was not suggesting to adopt this for xsimlab but rather I just wanted to know your opinion. So thank you very much for your detailed reply. I agree on the keep it simple principle.

I'll expose you my problem, just if you might be curious why i need to go with something slightly different and I cannot reuse xsimlab easily.

What about something like:
@xs.process
class Process1:
    ref_v1 = xs.foreign(Var1, 'v1')
    ref_v2 = xs.foreign(Var1, 'v2')
    ref_v3 = xs.foreign(Var1, 'v3')
Do I miss any real advantage of using a nested scheme over explicit declaration of each attribute (apart from maybe being a little bit more succinct)?

yes, because in this case Var1 MUST be a process and cannot be a Variable. In my case the instance of Var1 to which members I wanted to "import" as a "foreign" variable are regular, but complex classes, that I really don't want to rewrite as processes. This because they are generic classes I jet wrote and I think won't make sense to re-wrap within "processes". BTW this would be a common case if one uses external libraries into the modelling process.

In my case I am trying to put together a class-instance builder. The step-like-organization is perfect for using the processes in this way I think, although the "time-based" stepping of the simulation is not very useful in my case [would be cool to have it more "decoupled" imo]. The class instance I am creating is basically a 3D scene that is then used for performing a dynamic simulation possibly on a cluster [thus using xsimlab as the "time stepper" maybe is not the better choice]. The elements of the scene, or "Props" are analytical representations of 3D objects (spheres, cubes and so on) that are then transformed into "Mesh" objects by a "mesher" class. Each Prop will thus have some parameters and then their mesh representation is kept as an attribute of the Prop (I think this makes sense). Each prop will have additional attributes as the material, the initial velocity etc... again kept as attributes of the "Prop" class. I think this all makes sense as an object oriented framework. I am yet using attrs.attrs to create slotted attributes for my classes (the prop class, mesh class and so on) so moving towards the "variable" would be pretty easy. Those would also allow me to document the meaning of the attributes and so on. The only deal-breaking here is that I cannot use an unique store for those classes. That would require a lot of rethinking of a codebase that is yet up and running + problems with testing etc... When it comes to "process" function I think it is very cool but what it does not allow me to do (apart from the store) is to assign a class instance as a variable. In my case this makes sense because e.g. the Prop class might be the result of a single process "CreateProps" while the Mesh representation (which should live as an attribute of Props for the code to work) would be the result of a "MeshProps" process. Then I would have other processes like "ComputeInternalMeshing" and "AssignInitialVelocities" and so on...

e.g. if each mesh is composed by a list of vertices and indices I could create many different processes (as many a the number of meshes I need), like "CreateMesh1" and "CreateMesh2" each with the [output] variables "indices" [for the faces] and "vertices" and so on but that would be not very flexible if I need to create 100 meshes. This somehow also breaks the goodness of using objects into the modelling, which is an important part of most codes out there.

I've been looking for quite some time to packages like param or pypet and I found them somehow unsatisfying for one reason or another, while xsimlab seems to have all the features that make sense to have for dealing with scientific parameters within python objects/models (I am 100% in with your very good idea).

Thus, in my opinion, this would be what I think would make of xsimlab a more generic framework, that could be applied more easily in more circumstances:

"variable" should be usable in the everyday-life for creating any sort of self documenting classes used, e.g. by using a default "generic" intent, expliciting in-out only where makes sense or it is possible.
this is not currently possible due to the store structure. This would be possible by e.g. just defining an option to the process decorator (or a new decorator) which creates or not the store for the class only in the case we are dealing with a class that is supposed to be a "Process" class. One would need to alter the properties creation e.g. by using a private version of the attribute and so on.
allow object to be treated as variables. In the beginning this would be just a matter of checking whether the value of the variable is an object with variables itself and "pass" in some circumstances. In this way one could visualize the structure of the model by having an additional type of box for objects. Potentially these variables could or could not be expanded with their own variables. this would allow the choice to not "track" some nested variables that are in fact controlled by other parameters, or are just implementation details.
import some of the attributes of the objects as variables to be used within a process. e.g. In my case a process might need to set the initial velocity of a mesh, or its material properties or any other aspect. In this case I would just import the "mesh1.velocity" nested attribute for writing and set it, and still keep track of it as a "variable" for the model/where it has been set and so on. In this way one could decide at which level the "process" abstraction should start, depending on the actual parameters one want to keep track of and the amount of external classes involved.
... probably other stuff

this would render xsimlab more decoupled from the store and xarray structure and make of it a great "pipelining" tool [I recognize it was not the intent] for replicable scientific computations [which is a great deal now.] still allowing to use it in the store/xarray context or for solving other problems that can be described as "steps" with dependencies. basically splitting the "x", from the "sim" and the "lab" into different modules :+1: "lab" could be used with or without the "x" and the "sim" depending what one is doing. so long story short. what you did here is great! but it is basically a pipelining framework for science that could be super useful in hundreds of other contextes but it would require to separate more the "pipelining" from the "modelling" somehow. I would just keep that in mind because it could be also become a good propsal for allowing science replicability and overall for describing the "passages" that one make for computing a scientific result, which is of primarily importance everywhere. I really liked your take of this problem very-very much :1st_place_medal: .

furthermore the dependencies-resolution would also allow to perform some intelligent "caching" of the results so that one does not need to recompute some processes when the parameters controlling them were not changed.

to be clear these are just ideas and suggestions, I don't want to convince you about any of this or to modify in any way xsimlab, I just really liked to bring in my case. A more generic framework could attract more users and contributors, although I recognize your goal was very specific, perfectly executed and well-reached!

best, luca ps. are you going to be at EGU next year?

I'm quite reluctant to add support for nested variables for several reasons:

* This would add too much complexity to the (already quite complex) current implementation in many places, which is not worth of the benefits IMHO.

* We would need to extend API support for many things, e.g.,  `xs.foreign(Var1, ('v1', 'v2', 'v3'), intent={'v1': 'in', 'v2': 'in', 'v3': 'out'})`. For `xs.variable` the same kind of dictionary should be needed for specifying `dims` or later default values, validators, etc. At the end the code is not really shorter, it's even less readable.

* I'm slightly concerned by the overhead this would add when getting/setting the values of the variables in process classes through their properties.

* Nested variables don't play well with the xarray data model. The current one-to-one relationship between a`xs.variable` and a `xarray.Variable` is easy to deal with. Maintaining both one-to-one and one-to-many relationships in the case of nested variables is hard to maintain. We could argue the same for other kinds of automated I/O interfaces that we could later support, like command-line (options or input file entries) or UI (widgets). Storing complex objects (i.e., other than arrays or primitive types) as `xs.variable` values wouldn't play well with those I/O interfaces either.

Without taking statements from "The Zen of Python" too religiously, I think that the "flat is better than nested" principle rather applies well here.

First of all, it is always interesting to see for which problems people'd like to use xarray-simlab. This is still a young project with much room for improvement, feedback is very welcomed. So thanks for your feedback and for sharing your use case!

I agree that it is currently not straightforward to wrap existing code with xarray-simlab, or at-least it is not very clear (or still undocumented) how this could be done.

That said, I think that your use case is not incompatible with xarray-simlab (how it is working currently). You don't need to refactor your object-oriented code, but you'd need to write some wrapper code in xs.process classes, using composition.

I've uploaded a example similar to what you describe, but very simplified. Hopefully you'll find it helpful:

https://gist.github.com/benbovy/2bb908c745657712e1da213c0a9badd9

As you can see, it's actually possible to declare xs.variable for complex objects (or object collections) and make references to those objects in other process classes with xs.foreign. What you need is also to declare some other xs.variable that expose one or more attributes of those objects because (1) you want to use them as I/O of your processing and/or (2) these determine the workflow (the order of processes). This can all be done without nested variables.

Note that the the term simulation "store" is a bit misleading here. It is not a container that holds all simulation data but it rather gathers all references to the data that is useful to share between process classes and/or as I/O. Currently it is a simple dictionary accepting any kind of value. We could later support on-disk or distributed stores, which will still work if your objects are serializable.

super! thank you very much. I also did not investigated much the use of on_demand variables. I'll have a look at them.

thank you again!

xarray-contrib / xarray-simlab

simple xarray-simlab code fail to set attrs attributes. #51