csdms-compliant - Githubissues

DirkEilander commented 6 years ago

Hi guys,

After our last working day and discussions with Hiroaki and Paolo in the week thereafter I figured that the VU_nestedModelling branch had become to complicated. To make it more simple I thought of some changes which I've implemented in a new branch csdms-compliant:

keep functions as simple as possible. one taks per function.
no main script where all the model bmi need to fit; instead use a 'template' bmi class with abstractmethods and test scripts to ensure coherent functionality accross bmi's
separate individual model bmi functionality from combined/coupled bmi functionality.
separate spatial coupling methods from bmi methods
only linear (1D) index for all models

I've worked this out for PCR and CMF currently. You can now run this model by calling

python glofrim_runner.py run /path_to/glofrim.ini -e /path_to/environment.ini

where glofrim.ini looks like:

[models]
# reference to inifiles
PCR=PCR_Elbe/setup_PCR_30min_Elbe.ini
CMF=CMF_Elbe/input_flood.temp

[coupling]
# timestep for exchanges [sec]
dt=86400

[exchanges]
# From_model.var:multiplier = To_model.var@SpatialCoupling
# multiplier is optional
PCR.runoff=CMF.roffin@inpmat_2d

I think this format is quite generic. You now have full control over the coupling in the ini file and we can extend it for more variables and SpatialCoupling methods like:

[exchanges]
CMF.runoff=DFM.rain@all_1d
CMF.outflw=DFM.rain@upstream_1d

What still needs to be done - I will work on this later this week in the following sequence: (adding the new models should be even more straightforward now and a lot has already been figured out before)

[x] add DFM bmi
[x] add all_1d SpatialCoupling method
[x] add upstream_1d SpatialCoupling method
[x] add LFP bmi
[x] add WFL bmi
[x] add output testing (Jannis) - [ ] add documentation** ~~- [ ] add unit testing (Dirk)~~ ~~- [ ] add observation option based on geojson file with locations and variables~~

**for documentation check SPHINX http://www.sphinx-doc.org/en/stable/ and tutorial: https://matplotlib.org/sampledoc/index.html

DirkEilander commented 6 years ago

@hii600 @ChippChapp Let's test it for the different model combinations and post the ini settings and results here.

JannisHoch commented 6 years ago

@DirkEilander great work. I am happy to start testing and have time at hand, but maybe we should briefly have a chat how we want to do it beforehand. Also, I am happy to go through the code and start with the documentation, and maybe more importantly, try to pinpoint those parts which may be harder to understand for non-developers.

DirkEilander commented 6 years ago

For the testing part we agreed to work on unit test scripts and output test scripts. The unit test just make sure that after every new commit the code still works. The output test script wil run coupled models and plot the outputs to check if the coupling makes sense physically. We will work this out in separate branches and scripts in the test folder.

JannisHoch commented 6 years ago

@DirkEilander In the latest commit [https://github.com/openearth/glofrim/commit/c6fdde36fbfaf7bf37d786dc20e3fa8ce85d52c3] I am trying to use the index functionality but either I am applying it incorrectly or there is a bug... If I try cbmi.get_value_at_indices(CMF.rivinf, obs_ind_CMF) then I get the error "CMF" has no attribute rivinf. The same if I try with PCR and discharge. Any clue what's going wrong?

hii600 commented 6 years ago

@DirkEilander @ChippChapp Thank you, great work! But could you include a sample environment.ini file to the repository? Otherwise users cannot test running.

JannisHoch commented 6 years ago

@hii600 it should be environment.env (@DirkEilander sent one via mail). We changed the file type because ini was for some reason not recognized by gitignore. Hope this helps!

hii600 commented 6 years ago

@ChippChapp I see, and can you include a sample file to the repository like environment.env.sample? Otherwise I cannot know what it looks like...

DirkEilander commented 6 years ago

@hii600 the environment.env file is optional, but helps us to work with the same ini files on different environments as all absolute paths can be set in a different file from the relative paths. I've updated the glofrim.ini sample file in the root of the glofrim repository which include all options. You can set the engine section and the root_dir option in the models section in a separate environment.env file.

@ChippChapp Note that i have changed the naming of the directory with the ini_files. This is now called root_dir in the models section. Before this was test_dir in the general section. Please update your environment.env accordingly. I have already updated all scripts to work with this new naming.

DirkEilander commented 6 years ago

@ChippChapp regaring the get value at indices function. You wrote that this doens't work which is correct.

I am trying to use the index functionality but either I am applying it incorrectly or there is a bug... If I try cbmi.get_value_at_indices(CMF.rivinf, obs_ind_CMF) then I get the error "CMF" has no attribute rivinf.

Here you are trying to read an attribute of the CMF BMI object called rivinf. The BMI object does not store any variables as attributes. Instead use quotes around the "model.variable" argument to access a variable from that specif model. The syntacs is the same as in the ini file. Thus you should use

cbmi.get_value_at_indices("CMF.rivinf", obs_ind_CMF) (note the quotes)

PS. I think you want to use the "rivout" (river discharge) or "outflw" (total discharge including floodplain)

JannisHoch commented 6 years ago

@DirkEilander ah okay, did try a lot but not this one :) thanks! one more reason to have a good documentation....

DirkEilander commented 6 years ago

@ChippChapp in order to make sure we don't get annoying git issues it might be better if you could create a separate branch

JannisHoch commented 6 years ago

Yeah sorry for that, realized too late what I was doing... New branch is created and I tested some set-ups. I found the following problems to be fixed:

[x] CMF.rivout is recognized, only CMF.outflw
[x] CMF.outflw seems to be computed incorrectly, it's always 0. Also a plot shows that the distribution is wrong and cannot see how water is correctly flowing downstream (see pic)

~~- [ ] same is true for DFM, also here discharge (DFM.q1) is always 0~~

[x] DFM requires a string for cbmi.set_start_time() and not datetime.datetime format which again is needed for PCR
[x] same is true for LFP which also needs string
[x] LFP does not recognize set_var: lisflood-bmi-v5.9/liblisflood.so: undefined symbol: set_var. Can remember this is a bit more generic and due to the LFP BMI adapter. In GLOFRIM 1.0 I used get_var instead to overwrite variables (works too!).

download Fig 1. Plot of PCR discharge (left) and CMF outflw (right) for Elbe basin

Any clue what's going wrong?

DirkEilander commented 6 years ago

@ChippChapp Thanks for testing. I'll look into these issues later this afternoon.

Most seem quite easy to solve. About the CMF coupling. It seems like the inpmat is not set correctly. About the DFM coupling. This one will be a bit harder to figure out. is q1 the right variable?

I cannot make much of the figures. What do we see? It seems that the orientation right figure is also wrong/ different.

JannisHoch commented 6 years ago

@DirkEilander I added a caption to make it clearer. Also my guess that orientation (x- and y-axis swapped) is wrong since CMF resolution is half the one from PCR and therefore should have twice the number of cells per axis.

DirkEilander commented 6 years ago

@ChippChapp. Thanks! i had a look at the code and the previous glofirm verions and i noticed that indeed there is a translation from the fortran ordering of 2d arrays to the python ordering. Also we need to deel with missing values in the current version. I'll submit some changes to the Qtest branch soon.

DirkEilander commented 6 years ago

@ChippChapp: most issues are fixed in commit e9bc354 and i've created one testQ notebook for all combinations. Some more testing on the outputs for long runs is necessary. CMF seems to stop updating after a while and DFM still returns zeros...

CMF.rivout is recognized, only CMF.outflw

"rivout_avg" is exposed "rivout" is not. @hii600 Could you (again) explain the difference ? All exposed variables can be found in the CMF source code cama-flood_bmi_v3.6.2/src/external_bmi.F

CMF.outflw seems to be computed incorrectly, it's always 0. Also a plot shows that the distribution is wrong and cannot see how water is correctly flowing downstream (see pic)

This had to do with the dtype of the data which was used in set_value. This is fixed now for all models by parsing the data to the right dtype

same is true for DFM, also here discharge (DFM.q1) is always 0

still need to figure out what is happening. Could be the coupling or a bmi issue.

DFM requires a string for cbmi.set_start_time() and not datetime.datetime format which again is needed for PCR same is true for LFP which also needs string

fixed

LFP does not recognize set_var: lisflood-bmi-v5.9/liblisflood.so: undefined symbol: set_var. Can remember this is a bit more generic and due to the LFP BMI adapter. In GLOFRIM 1.0 I used get_var instead to overwrite variables (works too!).

fixed based on glofrim v1

JannisHoch commented 6 years ago

hi @DirkEilander , nice you could fix most of the issues!

It's indeed bit weird that CMF does run, but does not alter any variable anymore - heartbeat's gone! download

Regarding your comment whether q1 is the right variable, I don't know exactly, never extracted discharge before. I'll send Herman/Arthur an email.

DirkEilander commented 6 years ago

I had the same before. We need to figure out what is happening in March.

Regarding the DFM variables. I've linked a file from the source code which documents all exposed variables. I think q1 should be the right one.

JannisHoch commented 6 years ago

I have pushed a new par file with extended simulation period (1 year) to model_test_data bc other LFP cannot be tested.

What is interesting is that also then, the PCR-LFP model crashes around the same time when CMF stops updating... Maybe just coincidence but maybe a structural issue?

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-14-d0abf93f79f6> in <module>()
      4 while cbmi.get_current_time() < cbmi.get_end_time():
      5     t = cbmi.get_current_time()
----> 6     cbmi.update()
      7     for mod in models:
      8          df.loc[t, mod] = cbmi.get_value_at_indices(discharge_keys[mod], obs_ind[mod])[0]

/home/jannis/PhD/code/GLOFRIMplusCAMA/glofrim/glofrim-py/glofrim/glofrim_bmi.pyc in update(self)
    240                 self.bmimodels[item[1]].update_until(next_t)
    241             elif item[0] == 'exchange':
--> 242                 self.exchange(**item[1])
    243         self._t = next_t
    244 

/home/jannis/PhD/code/GLOFRIMplusCAMA/glofrim/glofrim-py/glofrim/glofrim_bmi.pyc in exchange(self, from_mod, to_mod, from_vars, to_vars, coupling, add, **kwargs)
    260         self.logger.debug("get data from {}.{}; set to {}.{}".format(from_mod, from_vars[0], to_mod, to_vars[0]))
    261         if coupling.at_indices():
--> 262             self.exchange_at_indices(from_mod, to_mod, from_vars, to_vars, coupling, add=add, **kwargs)
    263         else:
    264             self.exchange_same_grid(from_mod, to_mod, from_vars, to_vars, add=add, **kwargs)

/home/jannis/PhD/code/GLOFRIMplusCAMA/glofrim/glofrim-py/glofrim/glofrim_bmi.pyc in exchange_at_indices(self, from_mod, to_mod, from_vars, to_vars, coupling, add, **kwargs)
    305         else:
    306             vals_set = vals_get
--> 307         assert np.nansum(vals_get).round(5) == np.nansum(vals_set).round(5)
    308         self.logger.debug('total get {:3f}'.format(np.nansum(vals_get)))
    309         # get TO data & translate

AssertionError:

JannisHoch commented 6 years ago

Also, it seems as if the LFP 1D channels are not where they are supposed to be while the 2D DEM looks very much as it should.

download 1 Plot of PCR discharge and Qx discharge in LFP

download 2 Plot of PCR discharge and LFP DEM

DirkEilander commented 6 years ago

I'll look into the LFP 1D channels now. It could be that the set_value_at_indices is not working with the odd way that we need to use to set a variable in LFP.

In order to figure out what is happening at around March I'll extend the logger debug messages. Perhaps you @ChippChapp could try if the same happens in the PCR2CMF coupling if you start lets say March first and run for 2 months.

JannisHoch commented 6 years ago

Tried to run PCR2CMF from 2000-05-01 (yyyy-mm-dd) which revealed three things:

cbmi.set_start_time() does not work for PCR. It's still 2000-01-01 if trying 'cbmi.bmimodels['PCR'].get_start_time'. For CMF is 2000-05-01 however. The start time as reported with cbmi.get_start_time() is 2000-05-01 and seemingly depending on CMF (This in fact true also for the combination PCR2LFP and PCR2DFM!).
cbmi.get_current_time() gives 2000-01-01 at beginning of run which may be problematic as it's before model start time. Seems as if this is depending on PCR (or even all providing models?).
Despite these differences, you can run PCR2CMF with CMF starting from 2000-05-01. Even though it may receive PCR volumes from the wrong date, the runs also stop updating after 59 time steps.

With respect to current time I found that:

LFP current time is not advancing per daily update
DFM current time is advancing by less than 86400 per daily update which may be an issue if we talk about seconds??? At the end of one "PCR year" DFM current time is 349200 seconds which tranlates to around 4 days.... Maybe the update function updates only one computational time step but not for one entire day?

Testing LFP shows that:

the channel network seems to be correctly represented now
the values are not correct (yet)

DirkEilander commented 6 years ago

thanks for testing @ChippChapp !

There were indeed still some (large) bugs related to the model times and update time steps in the code. To much to explain here, but we need to make sure that in wflow 'runlengthdetermination' is set to 'intervals' and in all cases the model timestep fits in the GLOFRIM timestep. If this is not the case you will receive error messages.

The time issues should be resolved in the latest commit. This also resolved the issue with PCR2CMF, see fig. Other issues with LFP and DFM still remain..

ps. please make sure not to commit output files to the model_test_data repos or notebooks with outputs to any repository.

DirkEilander commented 6 years ago

@ChippChapp about LFP: what is the unit of the Qx and Qy variables and how should you interpret these?

JannisHoch commented 6 years ago

@DirkEilander as far as I know that's discharge in x- and y-direction, should be m3/s.

DirkEilander commented 6 years ago

The numbers I get are a few orders of magnitude larger than PCR discharge, that's why I was asking. So we should just sum Qx and Qy to get the total discharge leaving one cell? And just to double check, SGCQin is also in m3/s and this is the inflow to each cell I assume? And does it include the outflow from all neighboring cells from the previous time step or does it only account for lateral inflow? If it's not only lateral inflow we should add the PCR runoff to the current values instead of replacing these.

JannisHoch commented 6 years ago

Yes I realized that as well, wanted to check this today. Also the distrubtion of Qx/Qy are bit odd...

SGCQin is really only lateral inflow which is added to the general calculations within each cell.

Think there is still something wrong with time module and LFP. If you check LFP output after running the PCR2LFP for a month or so, no map output is written and also the mass balance check doesn't show anything, even with setting output intervals to a very small number. Maybe external and model internal times are not fully in sync? On the other hand, this somehow conflicts with the time series we can obtain for the entire coupled model period...

Interestnig also that I obtain completely different output for PCR2CMF... Maybe something to talk about on Monday. download

DirkEilander commented 6 years ago

@ChippChapp I have double checked, but the LFP time and GLOFRIM time seem to be synced fine - can't find any issues there. Diving further into it now.

Your CMF output looks really odd! Seems like a model problem, given that it works for me. Can you share your input_flood.nam and log files (via email)?

DirkEilander commented 6 years ago

To keep a central log of the testing activities some notes from today:

The LFP time issues are solved, but the current model seems unstable. The model bmi timestep changes with the adaptive model timestepping, and the python bmi-wrapper ignores the dt argument in the update function. This is now solved in the GLOFRIM lfp bmi-wrapper.
The coupling to DFM seems to work technically. The output his files and the hs variable show that there is water inputted to the model. However the q1 values at the node of interest remain very small. I'll check with DFM colleagues on Monday whether q1 is the right variable to look at and we need to double check if the units in the exchange make sense.

JannisHoch commented 6 years ago

@hii600 @DirkEilander I have updated the LFP model for the Elbe and it should work now. Have tested it with observed discharge as input and results further downstream are good. Resolution is 0.0125 degree and I have removed big chunks of the upstream part to keep sizes manageable.

To also make use of the updated model, you need to pull the latest commit from the model_test_data repos!

File and folder structure remained unchanged as well as the naming of the files - updating the ini/env-files should therefore not be required.

DirkEilander commented 6 years ago

@ChippChapp Thanks for updating the LFP model. Let me know once you've tested the coupled model if the model is still stable and results make sense.

DirkEilander commented 6 years ago

@ChippChapp @hii600 This new branch seems to work for all five models. @ChippChapp will continue to do some testing, but most bugs have been solved.

I've also added a convenience script to run the coupled models from command line, see glofrim-py/scripts/glofrim_runner.py You can now run a coupled model using: python glofrim_runner.py run /path/to/glofrim.ini –env /path/to/glofrim.env -s 2000-01-01 -e 2001-01-01 Check for more info: python glofrim_runner.py run –help & python glofrim_runner.py run_single –help

I suggest to keep this close this issue and report any bugs we may find in new issues. Some open issues like the observation points (should be a feature request instead of issue); unit testing and documentation can also better be tracked in separate issues.

openearth / glofrim

csdms-compliant #74