discuss met kyststasjoner_norge data

jerabaul29 commented 4 years ago

which field to use
confirm understanding of how data are set
caveats / to know?

jerabaul29 commented 4 years ago

Hi @nilsmkMET ,

I have had a look at the data available on lustre. Many thanks for making these available and the useful self documentation of the files, that was very helpful. :)

I have a few questions, both about advises / your opinions about which factors may be important for performing predictions / corrections with ML, and making sure that I understand things right:

We plan on considering that the prediction / modeling of the tidal effect is 'perfect' (or close to it), and that we can safely use the tide prediction field as a ground truth, for converting to / from data with / without tide included. Do you think this is a reasonable approximation?
What is the difference between the plain vs the _at_chartdatum fields?
We are a bit unsure about which data fields to use as the ground truth that we try to reproduce, vs. comparable output of the model. So far I have started to use the following, do you think this is ok or should I use other fields?
- tide field as a 'perfect' description of the tide effect, see previous point.
- observed as the ground truth from field observation of tide + storm_surge, so that when considering storm surge contribution alone, we use the result from storm_surge_ground_truth = observed - tide
- totalwater as the best prediction provided by the model for the observed quantity. Similarly, if we want to look only at the storm surge contribution, we use the result from storm_surge_model_prediction = totalwater - tide.
Following this understanding, we are thinking about performing a mapping using ML that corresponds to (predictors) -> (observed - totalwater), i.e. trying to learn the 'error' in the prediction. Do you think that this makes sense / is a reasonable use of the data?
can you confirm that the storm surge modeling takes into account 1) wind 2) air pressure effects, but not the effects that may come from wave breaking / runup? Do you expect that the wave effect may be important locally on the coast? We are thinking about including some wave activity predictors to the model, do you think this may be a good idea?
Is there some reference documentation about the storm surge model that I could read to get more details about both the model equations and implementation?
When performing the ensemble predictions, which values are different between different ensemble members / how are these chosen?

nilsmkMET commented 4 years ago

Hi, I'll try to answer all the questions here:

I think we have to assume that the tidal predictions are "perfect" yes. We could of course discuss this in more detail, since the predictions are based on harmonic analysis of long time series, and for a given "event" there will be non-linear interactions between tides and surge due to tide-surge and surge-tide interactions. When we start modeling the total water level instead of the tides and storm surge separately, we can probably quantify this better, but I don't think the interactions are big enough that you should do any drastic changes in your approach for now.
The at_chartdatum fields are simply adjusted from using the models mean water level to using the chart datum as reference instead. So its adding a constant offset. Note that this offset varies from station to station.
Yes, this approach (and the fields being used) is correct.
Trying to learn the 'error' sounds like a good idea. This is vaguely similar to what we are doing today by correcting the forecasts using the difference between observed and forecasted water levels over the last 5 days (with the latest differences given a larger weight than the oldest). If I understand correctly, the the ML method would replace what we are currently doing? One thing to note, if this is the approach, you should use the variable stormsurge as training data for storm surge. (i.e. totalwater is already corrected using our "simple method"). Let's have a talk if this just adds confusion...
The storm surge model takes into account only wind stress and atmospheric pressure yes, and the only "wave effects" would be when the tidal/surge wave moves into shallow areas. No surface waves (swell or wind waves) are taken into account. I don't think these effects are important at a timescale of ~1 hour, but you can find evidence of some longer waves due to tidal or surge resonans in some harbours. I think this point is also best to discuss "in person".
We unfortunately don't have any good documentation on our model setup, but we are using ROMS (2D) and some documentation can be found here: https://www.myroms.org/wiki/Documentation_Portal.
The ensemble members are all initialized from the same initial condition (coming from the deterministic model). The members are perturbed by different atmospheric forcing (using the entire ECMWF EPS system with 50+1 members). Our ensemble has 52 members; member 1 is the control, 2-51 are the perturbed EPS members, and the 52nd member is the deterministic model using the high resolution ECMWF atmospheric forcing.

To clarify more, let's perhaps have a coffee one day?

jerabaul29 commented 4 years ago

@nilsmkMET many thanks for your explanations, this makes things much clearer :) .

I think that the ML approach may be tested both instead and in addition to the present correction approach. Many thanks for pointing to this.

Having a coffee one day to discuss further about this sounds excellent. My calendar is very opened, feel free to suggest the time that is best for you. :)

jerabaul29 commented 4 years ago

@nilsmkMET I have been looking at / thinking about the data. A new list of comments / thoughts / question, that we may discuss either here or around a coffee :)

It looks like the time series available are relatively short, going 'only' a few years back. Of course I may be looking at the wrong place :) Do you know if / where longer time series for one or both of the stations measurements and model output may be available, or who I may talk to? Regarding model output, do you think it may be possible to perform some reanalysis or something like that, in order to have series over a longer time period? Ideally it would be great to have a really long time span (could 30+ years be realistic, or do you think data are not available so far back in time?), so that also infrequent events are well represented in the data.
I have heard of other measurement stations during some discussions, I can look for the data by myself, but if you know of some places where more measurement station data are available that will always be great. :)
A more fundamental question I am curious about is, "where does the difference between the surge model predictions and the measurements come from?". Does it come mostly from the storm surge model itself, or mostly from the forcing applied to it (wind and pressure), or a combination of both? Do you have some opinion on that? Do you think it may be possible to re-run the surge model on past times, using as both initialization and forcing for the model run the best estimate of water level, and wind and air pressure based on data assimilation of direct measurements, or something like this? I expect that if the storm surge model predictions are "very good" in this setup, there may be little the ML approach may be able to help with, as this would be a sign that the main source of error is the error in the forcing data (we could still look at applying some form of ML on the forcing data generation though, but this would start to be a quite distinct 'branch'), but that if the storm surge model predictions still have some biais etc in this case, then the ML approach may more likely be able to help. Do you think this is a correct understanding?

By the way, regarding finding a time for a coffee, tagging @vkbo on this discussion as she may be interested in joining :) .

metno / MachineOcean-WP12

discuss met kyststasjoner_norge data #6