xarray-contrib / xarray-simlab

Xarray extension and framework for computer model simulations
http://xarray-simlab.readthedocs.io
BSD 3-Clause "New" or "Revised" License
73 stars 9 forks source link

allow reading of time-related empirical date #166

Closed feefladder closed 3 years ago

feefladder commented 3 years ago

Maybe it is already possible, but I could not find it in the docs (also thought I'd make it a separate issue: Is there a way to add an xarray that also has a time dimension and read from that.

An example is precipitation: this is very empirical data, so should be read from a table.

jvail commented 3 years ago

Not sure if it is a solution you are looking for but this is what I do:

We have a lot of weather data in a csv file we need to read and feed to the model at each step. I have an "environment" process that gets a file path as input and reads the data into a DataFrame in the initialization. Then at each run_step I read from the DataFrame

self.T = self.weather_df['T'][step_start].to_numpy()

and update the variables I need. In this case it is an array of hourly values per day. Time step of the model is a day and the clock is setup like this

clocks={
    'day': pd.date_range(start='2002-09-02', end='2003-03-28', freq="1d")
}
benbovy commented 3 years ago

There's two ways to do that:

  1. Like the example that @jvail describes in his comment, import the data at simulation runtime, in a process.
  2. Set time-varying input values.

The advantage of 1 is that you can encapsulate the all the logic (load the data, etc) in a process class. However, there's no way in xarray-simlab to access all clock values during simulation runtime (#155), so vectorized operations within the process class are not possible. Output variables that depend on time should be declared without a time dimension, be computed at every time step (or on demand), and then may be saved with a given clock so the time dimension is added in the output Dataset.

With option 2 you need to read the data outside of the simulation, as a preprocessing step. Since you need to provide clock values in xs.create_setup() (e.g., as a numpy array or an xarray DataArray), you could reuse those values to do some additional computation in the preprocessing step if needed. Option 2 feels a natural choice for any model forcing input variable IMO.

So the difference between 1 and 2 is that for 1 the time variable is an output variable and for 2 it is an input variable.

feefladder commented 3 years ago

Sorry to come to this so late, but your remarks have been very helpful @benbovy and @jvail !