Multi-parameter runs - Githubissues

lumlauf commented 3 years ago

Hi all,

I don't know if this is actually an "issue": GOTM is often used for multi-parameter runs. In this case, it is repeatedly called from some external software (Matlab, phyton, shell, ...) with slightly modified namelist parameters to cover a large parameter space step by step.

Perhaps we have this functionality already, and I just don't know? Perhaps it can be improved via the new yaml files?

Best, Lars

bolding commented 3 years ago

Hi Lars

The parsac python package does that - and more on top. This is a very complete package that is used extensively by the lake users of GOTM. It can do sensitivity analysis and auto-calibration as well as ensemble simulations.

parsac supports both namelist files and yaml-files and also runs on HPC - ~50000 simulations are not un-common.

Karsten

lumlauf commented 3 years ago

Thanks, Karsten!

Is this tool easy to use even for someone with non-existing python knowledge (=me)? Is it easy to extend when specialized new parameters are added to the yaml-files for some advanced tailor-made problem that is only of interest for a particular study/paper?

Otherwise, would there be an easy alternative like in the old days where you could modify namelist parameters via environment variables? I guess what should always work is reading, modifying, and re-generating the yaml-files automatically from a host program like Matlab for every new run, right?

Thanks,

Lars

On 2/16/2021 6:33 PM, Karsten Bolding wrote:

Hi Lars

The parsac python package does that - and more on top. This is a very complete package that is used extensively by the lake users of GOTM. It can do sensitivity analysis and auto-calibration as well as ensemble simulations.

parsac supports both namelist files and yaml-files and also runs on HPC - ~50000 simulations are not un-common.

Karsten

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/gotm-model/code/issues/19#issuecomment-779998898, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEVOCROP7YYKPG6NCI6XQ4LS7KUAFANCNFSM4XW2QZJA.Web Bug from https://github.com/notifications/beacon/AEVOCRNFBSUIMI6CDCPYHG3S7KUAFA5CNFSM4XW2QZJKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOFZ65NMQ.gif

[ { "@context": "http://schema.org", "@type": "EmailMessage", "potentialAction": { "@type": "ViewAction", "target": "https://github.com/gotm-model/code/issues/19#issuecomment-779998898", "url": "https://github.com/gotm-model/code/issues/19#issuecomment-779998898", "name": "View Issue" }, "description": "View this Issue on GitHub", "publisher": { "@type": "Organization", "name": "GitHub", "url": "https://github.com" } } ]

--

Lars Umlauf

Physical Oceanography and Instrumentation Leibniz-Institute for Baltic Sea Research

phone : ++49 381 5197 223 fax : ++49 381 5197 114 web : www.io-warnemuende.de/lars-umlauf-en.html

address: Leibniz-Institute for Baltic Sea Research Seestrasse 15 D-18119 Rostock-Warnemuende Germany

bolding commented 3 years ago

It requires some work to get to know the workflow but when you have made your configuration file (in .xml) everything else is done via executing parsac with different command line options. Non-technical biologists have broken the code.

But depending on the question you have it might be easier to just create 20 different gotm.yaml files and make 20 runs from a script.

You can ask Peter or Marvin for more information as they used the tool when we had a workshop in DK.

Qing has used Jupyter Notebook with python to manipulate gotm.yaml files, you can do the same in R. But if you really need to run many runs you need something that works in parallel.

bolding commented 3 years ago

Picking this up again

Jorn and I have been working on a toll that will do (ensemble simulations) and data-assimilation using GOTM as the model. It will be a standalone tool using GOTM as a submodule - i.e. it will run with an unmodified vanilla version of GOTM. To facilitate that a few changes to GOTM will be nice to have - not need to have.

1) GOTM writes to unit 0 - stderr and flexout write to stdout. Running only one instance GOTM will just write to the screen. Running in parallel it becomes a mess because all members write to the screen in an un-ordered way and re-directing to a file for each of the members makes a lot of sense. This can be done in two ways. 1) As part of GOTM core i.e. provide filenames in gotm.yaml - if length is 0 write to screen - if not open files. 2) Open the files outside core GOTM.

2) The progress variable in GOTM must be a module level variable set in init_gotm() instead of only in time_loop()

3) jul2 and secs2 are made public in time.F90. So a data-assimilation tool can know about the stop time.

4) Are there any objections against renaming like: init_gotm() -> initialize_gotm(), time_loop() -> integrate_gotm() and clean_up() -> finalize_gotm()?

jornbr commented 3 years ago

Hi Karsten,

I expect that the ensemble/DA functionality would be implemented by wrapping GOTM? (since it would require MPI, and GOTM itself should not depend on MPI). If so, I imagine that some of what you propose could be implemented in that wrapper, without touching GOTM itself. Point by point:

I'd imagine that redirecting can be done with minimal changes to GOTM as long as it picks up the unit numbers to write to from some publicly accessible shared module (e.g. util). I do see why the names would have to go into gotm.yaml though. Why not use command line switches and/or some logic in the wrapper to change the unit numbers, pointing them to newly opened file? flexout already allows the host to provide custom fatal_error/log_message routines, so GOTM can make those use the same unit numbers (by implementing those routines as part of type_gotm_host).
In principle, I guess you could manipulate MinN and MaxN from the wrapper (like GOTM-GUI does too) - and then only the wrapper might want to keep track of the original MinN and MaxN (i.e., no changes to GOTM core needed)?
That's already the case, right? jul2, secs2 are public
That seems a good idea to me. There could be a benefit in additionally splitting init_gotm into a configure_gotm and an initialize_gotm

Cheers,

Jorn

bolding commented 3 years ago

add 1) - the way redirecting to a file now is by opening a file with e.g. unit=0 (or in modern Fortran error_unit). Now the redirection is done outside GOTM - in the worker wrapper. Question is if it should be moved into GOTM.

9 write(outputid, "(A,I0.4)") '', member 10 write(strbuf, "(A,I0.4)") 'gotm_', member 11 yaml_file = TRIM(strbuf) // '.yaml' 12 fname = TRIM(strbuf) // '.stderr' 13 open(error_unit,file=fname) 14 fname = TRIM(strbuf) // '.stdout' 15 open(output_unit,file=fname) 16 call init_gotm()

diagnostics and standard output could be done similar to yaml_file - i.e. directly setting a internal GOTM variable and let GOTM do the opening.

Introducing stderr and stdout as variables and assigning to error_unit and output_unit if filenames not specified otherwise get stderr and stdout via newunit call in open.

One advantage of re-directing directly to error_unit is that flushing is done.

So not 100% sure what is best.

add 2) - that is what is being done - server does date calculations, calulate MinN and MaxN and send them to workers where they are directly used in time_loop().

add 3) - it was jul1 and secs2 as it will allow observation_handler to skip observations before simulation start.

add 4) I'll do that in master directly as I only think you (Jorn) will have side effects :-)

jornbr commented 3 years ago

Re 1, I'd go for the "Introducing stderr and stdout as variables and [initializing those] to error_unit and output_unit". And the wrapper could open files instead and assign their units to stdout and stderr (instead of reopening error_unit and output_unit). It keeps the GOTM core simpler. And redirect to file for a serial run can just be done on the command line as usual (e.g., gotm &> output.log) - no need for built-in support.

Regarding buffering - that seems compiler specific, with ifort for instance line buffering output_unit (good enough for us) - https://community.intel.com/t5/Intel-Fortran-Compiler/Enabling-buffered-I-O-to-stdout-with-Intel-ifort-compiler/td-p/993203. I'd write everything to stdout except error messages, like I think most tools would - https://en.wikipedia.org/wiki/Standard_stream.

bolding commented 3 years ago

lets see if other have comments - otherwise I'll do the changes to GOTM soon.

lumlauf commented 3 years ago

Hans and me discussed some planned applications with multiple instances of GOTM that may be related to this topic. We are collaborating with two groups that run atmospheric models. Looking for a simple representation of two-way atmosphere-ocean coupling, we thought about running an instance of GOTM underneath each grid point of the atmospheric model. GOTM and the atmospheric model would then feed-back via the atmosphere-ocean fluxes on each time step (or perhaps, if this turns out to be more efficient) only every couple of time steps. We are talking about order 10^4-10^5 grid points (= GOTM instances) to start with.

In view of the changes discussed above, are there any things worth considering already at this point? What are your thoughts about this?

Thanks!

bolding commented 3 years ago

I think that requires a different concept. You don't really want to create and open 10^5 gotm.yaml files. Then ensemble runs the same setup in different configurations and only in order of 10^2. You want different setups (e.g. lat, lon) but only in one incarnation each.

1) add necessary routines to the meteo model and let it handle 'everything'.

2) Create a dedicated program that calls GOTM in a grid - let it interact with the meteo model via MPI.

I would personally go for 2.

bolding commented 2 years ago

Just stumbled on this

As part of the development of the rewrite of GETM Jorn has actually done exactly what you ask for - i.e. running a (large) number of GOTM models on a grid. The 2-way coupling with the atmospheric model is still to be done.

bolding commented 8 months ago

Back to the original question - EAT- https://github.com/BoldingBruggeman/eat/wiki can do exactly what you ask initially - i.e. run a large number of differently configured GOTM simulations

gotm-model / code

Multi-parameter runs #19

--

address: Leibniz-Institute for Baltic Sea Research Seestrasse 15 D-18119 Rostock-Warnemuende Germany