rom-py / rompy

Relocatable Ocean Modelling in PYthon (rompy) combines templated cookie-cutter model configuration with various xarray extensions to assist in the setup and evaluation of coastal ocean model
https://rom-py.github.io/rompy/
BSD 3-Clause "New" or "Revised" License
2 stars 9 forks source link

Match up code design #12

Closed ghost closed 1 year ago

ghost commented 3 years ago

I have completed the first version of the match up code (see gist here) and here are some of the decisions I made around the output format.

From what I could see there were 2 viable dataset formats for the matchup code.

  1. Create a 1D dimension that tracks the ID of each observation platform (i.e. each waverider buoy has a corresponding index). And then reference everything to this index. So in the case of the waveriders, lat,lon, time, model outputs and field observations can all be indexed by each waverider. The challenge is that the measurements dont share timestamps between different instruments, so we need to create a new time and model/measurement outputs for each platform. For small datasets this is no problem but I will get out of hand as the number of measurements you have increases.

  2. Create universal timestamp, lat and lon dimensions. This is useful for using xarrays .sel method for slicing the matchup output but adds a bit of complexity when slicing the output models to match up against the field observations. By using universal lat/lon/time dimensions, the field obs outputs pad out the arrays with nans to match the array shape so I also saved another variable which contains the lat/lon indices which contain field observations. I think this could be bypassed with the .where method by the end user but will need to investigate it.

I ended up going with the latter because it will scale better with other field observations sources. For the matchup code to work on observations with time-varying locations it will need some adjusting, probably just query the KDtree inside the time loop and change the how the lat/lon dims are created.

pbranson commented 3 years ago

Sorry only just saw this issue - thought it was all happening in the PR.

I think once you consider the moving platform case you will need to revert to 1. with the wavebuoys just being a special case of the moving platform code. Adding a variable for 'platform' makes sense, and repeated times also make sense. Ultimately we are converting to a 1D dataset of 'matchups' with several variables (time, grid references, obs, models values...)

ghost commented 3 years ago

Apologies, I hadn't figured out PR's at that time so I thought Id write out a blurb about some of the code design decisions I had made in the meantime. If there is a easy way to loop through the systematically, we could use netcdf groups to split up platforms to keep the dataset clean? I dont think xarray supports getting the keys of grouped nc's so would need to use netCDF4 probably

ghost commented 2 years ago

Just threw together some a test for the match up code, should I put the function in the "test_intake.py" file or start a new test file? If there are other utils that may be useful we could throw together a 'test_utils.py' file or similar

pbranson commented 2 years ago

Thanks Chris,

Yeah start a new module file called test_util.py

On Thu., 14 Oct. 2021, 2:45 pm chriswhitwell, @.***> wrote:

Just threw together some a test for the match up code, should I put the function in the "test_intake.py" file or start a new test file? If there are other utils that may be useful we could throw together a 'test_utils.py' file or similar

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rom-py/rompy/issues/12#issuecomment-943044783, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADG5WQAJK3BU4BFDC66JSD3UGZ4CDANCNFSM44LY2CXQ .