Closed benbovy closed 3 years ago
I think I have a use-case for this: I am trying to model a plant-growth model in simlab. So at the core (oversimplified) would be:
import xsimlab as xs
import numpy as np
from math import sin, cos, tan, acos, pi, sqrt, exp, radians
@xs.process
class dBiomass:
"""calculates biomass change based on light intensity
"""
#assuming that fraction intercepted light is 1
eff_light = xs.variable(description='light use efficiency', default=3)
latitude = xs.variable(default = 59)
maxrad = xs.variable(dims='time',description='extraterrestial maximum radiation [mm/day]',intent='out')
biomass = xs.variable(
dims='time',intent='out',description='grown biomass'
)
@xs.runtime(args='master_clock_values')
def initialize(self, datetime):
#assumes the datetime to be a datetime object
#https://stackoverflow.com/questions/620305/convert-year-month-day-to-day-of-year-in-python
#calculate maximum extraterrestial radiation in mm/day
#from FAO 56 page 9
self.latitude = radians(self.latitude)
self._day = datetime.timetuple().tm_yday
self._Gsc = 0.0820 #MJ/m**2/min solar constant
self._solar_declination = 0.409*sin(2*pi/365*self._day-1.39)
self._dr = 1+0.033*cos(2*pi/365*self.day) #inverse relative distance
self._omegas = acos(-tan(self.latitude)*tan(delta)) #sunset hour angle
self.maxrad=0.408*24*60/pi*self._Gsc*self._dr*(self._omegas*sin(self.latitude)*sin(self._solar_declination)+cos(self.latitude)*cos(self._solar_declination)*sin(self._omegas))
@xs.runtime(args='step_delta')
def run_step(self,dt):
self._d_biomass = self.eff_light*self.maxrad*dt
def finalize_step(self):
self.biomass += self._d_biomass
biomass_model_very_raw = xs.Model({'Biomass':dBiomass})
# %create_setup biomass_model_very_raw --default --verbose
import xsimlab as xs
from datetime import datetime, timedelta
ds_in = xs.create_setup(
model=biomass_model_very_raw,
clocks={
'time':np.arange(datetime(2010,7,1), datetime(2015,7,1), timedelta(days=1)).astype(datetime)
},
master_clock='time',
input_vars={
# light use efficiency
'Biomass__eff_light': 3,
# ---
'Biomass__latitude': 59,
},
output_vars={}
)
ds_in
out_ds = ds_in.xsimlab.run(biomass_model_very_raw)
But I get the following error:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-15-cb0a1185e7ff> in <module>
1 ds_in
----> 2 out_ds = ds_in.xsimlab.run(biomass_model_very_raw)
~/miniconda3/envs/EHydrology/lib/python3.7/site-packages/xsimlab/xr_accessor.py in run(self, model, batch_dim, check_dims, validate, store, encoding, decoding, hooks, parallel, scheduler, safe_mode)
821 )
822
--> 823 driver.run_model()
824
825 return driver.get_results()
~/miniconda3/envs/EHydrology/lib/python3.7/site-packages/xsimlab/drivers.py in run_model(self)
475 *args,
476 parallel=self.parallel,
--> 477 scheduler=self.scheduler,
478 )
479
~/miniconda3/envs/EHydrology/lib/python3.7/site-packages/xsimlab/drivers.py in _run(dataset, model, store, hooks, validate, batch, batch_size, parallel, scheduler)
334
335 try:
--> 336 model.execute("initialize", rt_context, **execute_kwargs)
337
338 for step, (_, ds_step) in enumerate(ds_gby_steps):
~/miniconda3/envs/EHydrology/lib/python3.7/site-packages/xsimlab/model.py in execute(self, stage, runtime_context, hooks, validate, parallel, scheduler)
1021 else:
1022 for p_obj in self._processes.values():
-> 1023 _, (_, signal_process) = self._execute_process(p_obj, *execute_args)
1024
1025 if signal_process == RuntimeSignal.BREAK:
~/miniconda3/envs/EHydrology/lib/python3.7/site-packages/xsimlab/model.py in _execute_process(self, p_obj, stage, runtime_context, hooks, validate, state)
841
842 state_out, signal_out = executor.execute(
--> 843 p_obj, stage, runtime_context, state=state
844 )
845
~/miniconda3/envs/EHydrology/lib/python3.7/site-packages/xsimlab/process.py in execute(self, p_obj, stage, runtime_context, state)
528 return {}, RuntimeSignal.NONE
529 else:
--> 530 signal_out = executor.execute(p_obj, runtime_context, state=state)
531
532 skeys = [p_obj.__xsimlab_state_keys__[k] for k in self.out_vars]
~/miniconda3/envs/EHydrology/lib/python3.7/site-packages/xsimlab/process.py in execute(self, p_obj, runtime_context, state)
390 p_obj.__xsimlab_state__ = state
391
--> 392 args = [runtime_context[k] for k in self.args]
393
394 signal = self.meth(p_obj, *args)
~/miniconda3/envs/EHydrology/lib/python3.7/site-packages/xsimlab/process.py in <listcomp>(.0)
390 p_obj.__xsimlab_state__ = state
391
--> 392 args = [runtime_context[k] for k in self.args]
393
394 signal = self.meth(p_obj, *args)
~/miniconda3/envs/EHydrology/lib/python3.7/site-packages/xsimlab/drivers.py in __getitem__(self, key)
46
47 def __getitem__(self, key: str) -> Any:
---> 48 return self._context[key]
49
50 def __setitem__(self, key: str, value: Any):
KeyError: 'master_clock_values'
A minimal example: Ideally I would like to also read precipitation data from an array that is also on the same time as the clock. I cannot find any documentation on this, I could maybe work out this example further and make a PR for the docs?
a (very pedantic) comment: Since Github moves from master to main, (to avoid racism) maybe master clock could be renamed to main clock? Easy to do while the project is still young?
Ok. so apparently it is not implemented yet... It is possible to access time data at runtime though: using the
@xs.runtime(args='step_start')
def run_step(self,dt,date):
#do stuff with date here such as calculate maxrad
now there is a problem with accessing the datetime as an actual datetime :/
Ideally I would like to also read precipitation data from an array that is also on the same time as the clock.
It is possible to access time data at runtime though
You could also set time-varying input values in xs.create_setup
, see #166.
now there is a problem with accessing the datetime as an actual datetime :/
Although I haven't tried it yet, I don't think anything would prevent you using datetime-friendly data types for the clock values, like you can use it with xarray.
Since Github moves from master to main, (to avoid racism) maybe master clock could be renamed to main clock? Easy to do while the project is still young?
I'm very open to that idea. We could go through some depreciation cycle to ensure smooth transition.
Ideally I would like to also read precipitation data from an array that is also on the same time as the clock.
It is possible to access time data at runtime though
You could also set time-varying input values in
xs.create_setup
, see #166.
This is indeed a valid workaround, but has some drawbacks:
biomass_model_very_raw = xs.Model({'biomass':dBiomass})
ds_in = xs.create_setup( model=biomass_model_very_raw, clocks={ 'day':pd.date_range('2000-01-01',periods=365) }, master_clock='day', input_vars={
'biomass__eff_light': 3,
},
)
def maxrad(day,latitude = 59.):
latitude=radians(latitude)
Gsc = 0.0820 #MJ/m**2/min solar constant
solar_declination = 0.409*sin(2*pi/365*day-1.39)
dr = 1+0.033*cos(2*pi/365*day) #inverse relative distance
omegas = acos(-tan(latitude)*tan(solar_declination)) #sunset hour angle
return 0.408*24*60/pi*Gsc*dr*(omegas*sin(latitude)*sin(solar_declination)+cos(latitude)*cos(solar_declination)*sin(omegas))
ds_in['biomass__maxrad']=xr.DataArray(dims='day',data=np.vectorize(maxrad)(ds_in['day.dayofyear'])) out_ds = ds_in.xsimlab.run(biomass_model_very_raw)
However, I get the error:
```python
ValueError: Missing master clock dimension / coordinate
even though I set it in the setup.
Although I haven't tried it yet, I don't think anything would prevent you using datetime-friendly data types for the clock values, like you can use it with xarray.
So there is a problem here: datetime in xarray coordinates is datetime64[ns]
format, where .astype(datetime)
converts to int in stead of datetime bug report. Xarray smartness like day['time.dayofyear']
is not possible, since we get the data passed, in stead of DataArray
or something.
So: all modification of datetime (at runtime) is based on numpy datetime64[ns]
, which ATM has horrible conversion.
e.g. conversion to datetime format,
I'm very open to that idea. We could go through some depreciation cycle to ensure smooth transition.
sounds great! I have absolutely no experience with going through deprecation cycles, however :/
Actually the only sensible place to access the entire array of time values would be in initialization step right? and then index those variables in other timesteps.
The ValueError: Missing master clock dimension / coordinate
is because for some reason of the line below removes the attributes of the day
coordinate (xarray-simlab relies on some of those attributes to determine the clock coordinates among the coordinates in the dataset):
ds_in['biomass__maxrad']=xr.DataArray(dims='day',data=np.vectorize(maxrad)(ds_in['day.dayofyear']))
Instead of directly assigning variables and/or values to the dataset, you should use instead xs.create_setup
or Dataset.xsimlab.update_vars
, e.g.,
ds_in = ds_in.xsimlab.update_vars(
model=biomass_model_very_raw,
input_vars={'biomass__maxrad': xr.DataArray(dims='day',data=np.vectorize(func)(ds_in['day.dayofyear']))}
)
The latter solution has the great advantage of validating (and/or converting) the new given input values so that it is consistent with the model.
Xarray smartness like day['time.dayofyear'] is not possible, since we get the data passed
This should be possible if we address #141.
all modification of datetime (at runtime) is based on numpy datetime64[ns], which ATM has horrible conversion.
Is the conversion below that horrible? (from this SO question)
@xs.process
class A:
foo = xs.variable(default=1)
@xs.runtime(args='step_start')
def run_step(self, t):
datetime_obj = t.astype('datetime64[s]').item()
Actually the only sensible place to access the entire array of time values would be in initialization step right? and then index those variables in other timesteps.
Yes, but one tricky thing is that we don't know the name of the clock dimension (it is set when creating a model setup). So if we want to declare an output variable with values along this dimension, we have a problem: `xs.variables(dims=???, intent='out').
In the case of using update_vars(....)
, the actual dimension of maxrad = xs.variable(dims=['whatever',()])
is ignored and an array with the named dim is passed. So we could just as well create a placeholder for the clock name.
Conceivably, one would want to work in a model with multple timesteps as in this paper. Tehn, the %create_setup
magic command could just as well also specify time dimensions like so:
#define a clock in a process with dummy clock_dims='something' to which one can refer
@xs.process
class dBiomass:
maxrad = xs.variable(clock_dims='day',description='maximum radiation')
@xs.runtime(args='day') #or even another decorator
def initialize(self,day):
maxrad = 2*day
@xs.process
class OtherClock:
some_var = xs.variable(clock_dims='other_clock',description='some other variable')
monthly_var = xs.variable(clock_dims='monthly_clock',description='var that wants a monthly clock')
@xs.runtime(args='other_clock')
def initialize(self,other_clock):
ds_in = xs.create_setup(
model=biomass_model_very_raw,
clocks={
#clock for calculating maxrad, uses datetime format
'day':pd.date_range('2000-01-01',periods=365),
# clock for calculating some other variable
'other_clock': 'day'
#clock that wants to use its own clock for initialization
'monthly_clock': pd.date_range('2000-01-01','2001-01-01',periods=12)
},
master_clock='day',
input_vars={
# light use efficiency
'dBiomass__eff_light': 3,
},
output_vars={'dBiomass__biomass':'day'}
)
and then in the preprocessing, we could conceivably just pass the clocks in, and link to the correct clock in the case of a reference to an earlier clock, and pass that. Otherwise, maybe something like a global_clock_name
, where main_clock
/master_clock
is a reserved name could work?
In the case of using update_vars(....), the actual dimension of maxrad = xs.variable(dims=['whatever',()]) is ignored and an array with the named dim is passed.
We could actually check the input dimension labels when creating an input dataset with xs.create_setup
or xsimlab.update_vars
. Currently they are validated when calling xsimlab.run()
, as input datasets may have been loaded from file or created by other means.
So we could just as well create a placeholder for the clock name.
I guess it would work, yes, at least for the main clock. The current usage for other clocks is to save snapshots at different times/frequencies than the main clock. Proper support of multiple timesteps is another story.
We could actually check the input dimension labels when creating an input dataset with
xs.create_setup
orxsimlab.update_vars
. Currently they are validated when callingxsimlab.run()
, as input datasets may have been loaded from file or created by other means.
my point here was that we have the exact same problem here as with accessing the time clock: we initialize maxrad=xs.variable(dims=???)
without any care about the actual dims, since these are named inside the model class. Validation when updating therefore will give a negative, because we cannot know the clock dim name beforehand. since there is only one main clock, why not enforce a name of main_clock
? this question may go off-topic, since it questions quite some in-place implementation...
I guess it would work, yes, at least for the main clock. The current usage for other clocks is to save snapshots at different times/frequencies than the main clock. Proper support of multiple timesteps is another story.
I created a pull request #168 but I think it should also address #141 in order to work well, since we have to set some variables as a DataArray. The example I use is quite verbose, but I could include it as a test?
I got your point, hence the placeholder suggestion in my 1st comment, e.g., something like xs.MASTER_CLOCK
(let's address master vs. main in another issue/pr) that could be used when declaring variable dimensions, e.g., xs.variable(dims=xs.MASTER_CLOCK, intent='out')
, so that when creating the output dataset xarray-simlab knows it has to replace this placeholder with the actual master clock label defined by the user. Any variable declared with this placeholder should probably have static=True
set too.
I prefer this option over enforcing the name of the master clock.
(@jeanbraun's suggestion)
This could be useful, e.g., for vectorized computation of time-dependent values at the
initialize
stage, e.g.,One big problem, however, is that the
rate
variable in the example above should ideally have the same dimension label than the master clock dimension, but the latter is undefined (it is only known when running a model). We could define a specific placeholder, but it gets complicated...