Open jerabaul29 opened 4 years ago
Hi @nilsmkMET ,
I have had a look at the data available on lustre. Many thanks for making these available and the useful self documentation of the files, that was very helpful. :)
I have a few questions, both about advises / your opinions about which factors may be important for performing predictions / corrections with ML, and making sure that I understand things right:
We plan on considering that the prediction / modeling of the tidal effect is 'perfect' (or close to it), and that we can safely use the tide prediction field as a ground truth, for converting to / from data with / without tide included. Do you think this is a reasonable approximation?
What is the difference between the plain vs the _at_chartdatum
fields?
We are a bit unsure about which data fields to use as the ground truth that we try to reproduce, vs. comparable output of the model. So far I have started to use the following, do you think this is ok or should I use other fields?
tide
field as a 'perfect' description of the tide effect, see previous point.
observed
as the ground truth from field observation of tide + storm_surge
, so that when considering storm surge contribution alone, we use the result from storm_surge_ground_truth = observed - tide
totalwater
as the best prediction provided by the model for the observed
quantity. Similarly, if we want to look only at the storm surge contribution, we use the result from storm_surge_model_prediction = totalwater - tide
.
Following this understanding, we are thinking about performing a mapping using ML that corresponds to (predictors) -> (observed - totalwater)
, i.e. trying to learn the 'error' in the prediction. Do you think that this makes sense / is a reasonable use of the data?
can you confirm that the storm surge modeling takes into account 1) wind 2) air pressure effects, but not the effects that may come from wave breaking / runup? Do you expect that the wave effect may be important locally on the coast? We are thinking about including some wave activity predictors to the model, do you think this may be a good idea?
Is there some reference documentation about the storm surge model that I could read to get more details about both the model equations and implementation?
When performing the ensemble predictions, which values are different between different ensemble members / how are these chosen?
Hi, I'll try to answer all the questions here:
at_chartdatum
fields are simply adjusted from using the models mean water level to using the chart datum as reference instead. So its adding a constant offset. Note that this offset varies from station to station.stormsurge
as training data for storm surge. (i.e. totalwater
is already corrected using our "simple method"). Let's have a talk if this just adds confusion...To clarify more, let's perhaps have a coffee one day?
@nilsmkMET many thanks for your explanations, this makes things much clearer :) .
I think that the ML approach may be tested both instead and in addition to the present correction approach. Many thanks for pointing to this.
Having a coffee one day to discuss further about this sounds excellent. My calendar is very opened, feel free to suggest the time that is best for you. :)
@nilsmkMET I have been looking at / thinking about the data. A new list of comments / thoughts / question, that we may discuss either here or around a coffee :)
It looks like the time series available are relatively short, going 'only' a few years back. Of course I may be looking at the wrong place :) Do you know if / where longer time series for one or both of the stations measurements and model output may be available, or who I may talk to? Regarding model output, do you think it may be possible to perform some reanalysis or something like that, in order to have series over a longer time period? Ideally it would be great to have a really long time span (could 30+ years be realistic, or do you think data are not available so far back in time?), so that also infrequent events are well represented in the data.
I have heard of other measurement stations during some discussions, I can look for the data by myself, but if you know of some places where more measurement station data are available that will always be great. :)
A more fundamental question I am curious about is, "where does the difference between the surge model predictions and the measurements come from?". Does it come mostly from the storm surge model itself, or mostly from the forcing applied to it (wind and pressure), or a combination of both? Do you have some opinion on that? Do you think it may be possible to re-run the surge model on past times, using as both initialization and forcing for the model run the best estimate of water level, and wind and air pressure based on data assimilation of direct measurements, or something like this? I expect that if the storm surge model predictions are "very good" in this setup, there may be little the ML approach may be able to help with, as this would be a sign that the main source of error is the error in the forcing data (we could still look at applying some form of ML on the forcing data generation though, but this would start to be a quite distinct 'branch'), but that if the storm surge model predictions still have some biais etc in this case, then the ML approach may more likely be able to help. Do you think this is a correct understanding?
By the way, regarding finding a time for a coffee, tagging @vkbo on this discussion as she may be interested in joining :) .