inbo / crow

Online tool to visualize birds detected by weather radars
https://inbo.github.io/crow/
MIT License
15 stars 7 forks source link

Dutch radars: change in nominal timestamp causes visualisation failure #170

Closed mrtnrey closed 1 year ago

mrtnrey commented 2 years ago

The Dutch radars (Herwijnen=NLHRW, Den Helder=NLDHL) are unavailable in the visualisation since 2022-08-29. However, the source data are produced; see e.g. https://opendata.meteo.be/ftp/observations/radar/vbird/nlhrw/2022/

A quick check revealed the cause: the nominal timestamp of the Dutch radar shifted one minute, i.e.: HH:05 became HH:06 HH:10 became HH:11 See the source file https://opendata.meteo.be/ftp/observations/radar/vbird/nlhrw/2022/nlhrw_vpts_20220829.txt at around 11:10 UTC that day.

As a consequence, the CROW visualisation fails...

The timestamp is read from the source HDF radar file so a quick fix on the side of the vpts generation is not possible imo.

@peterdesmet any ideas to implement a fix? Tnx!

peterdesmet commented 2 years ago

@niconoe any idea what the cause might be? @stijnvanhoey and I can then attempt to fix it.

stijnvanhoey commented 2 years ago

I had a look and the 'origin' of the issue is how the 'automatic downsampling' (by filtering out) as defined in https://github.com/inbo/crow/blob/main/src/components/Home.vue#L602 is setup at the momen:

(Note, this makes the approach very effective int terms of performance as half of the data is directly dropped before any further processing).

For example, when the data has a 5min resolution, 00:00, 00:05, 00:10... and the temporal resolution of the app is 10min, the data used for the visualizations is 00:00, 00:10 (dropping 00:05). As dutch radars has data rows 00:01, 00:06, 00:11, no data is retained (and no error is raised).

So, who to fix

  1. Adjust the incoming timestamp on the incoming data-file adding a rounding https://github.com/inbo/crow/blob/main/src/helpers.ts#L163:
    function roundNearest(num: number, resolution: number) {
      "Round to nearest resolution"
      return Math.round(num / resolution) * resolution;
    }
    datetime: roundNearest(moment.utc(row.substring(0, 13), "YYYYMMDD HHmm").valueOf(), 5 * 60 * 1000),

    This is fine if we know that the data will never have a resolution lower than 5minutes(!). It probably will work on 1min data resolution with 10min app resolution, but multiple incoming data records will get the same (rounded) timestamp and the setter would just re-assign each of these which is not ideal...). See https://github.com/inbo/crow/pull/174 for the PR.

  2. refactor the storeDataRow, but not yet about the best approach. E.g. looking for 'closest timestamp' instead of match will have a huge performance loss. Other option is to refactor the whole concept of the prefilled object (with predefined datetimes), but this is a much larger refactoring which we need to check if ok time-wise.
niconoe commented 2 years ago

Just saw the notification... @stijnvanhoey's analysis seems perfect at first look. Indeed, I'm not sure what would be the best approach without spending time in it but once we think about it, this 5-minutes downsampling seems quite rigid (but it didn't bite us before now). I guess refactoring the storeDataRow would be ideal, but it's indeed a large change. You're also correct to be concerned about performance: CROW is quite heavy on the browser and I remember having to be careful about that from the beginning.

mrtnrey commented 2 years ago

"data will never have a resolution lower than 5minutes" → I don't think it'll be the case soon, so this solution seems fine to me.

You could alternatively consider a "Floor" function of "Round" to obtain a 5-min timestep. This avoids collision between (rare but possible) late radar file arrivals. See e..g. this hypothetical case With Round timestamp at 00:08 → 00:10 next timestamp at 00:11 → 00:10 With Floor timestamp at 00:08 → 00:05 next timestamp at 00:11 → 00:10 But in practise I don't think this will happen very often so the choice between Floor/Round is not really important.

stijnvanhoey commented 2 years ago

You could alternatively consider a "Floor" function of "Round" to obtain a 5-min timestep.

Indeed, valid point; I updated the PR, https://github.com/inbo/crow/pull/174

peterdesmet commented 2 years ago

@stijnvanhoey can you clarify with an example why a “that the data will never have a resolution lower than 5minutes” is a requirement?

Otherwise, fine with using/keeping the approach to keep client processing as light as possible.

stijnvanhoey commented 2 years ago

What currently happens for non-dutch radars:

What currently happens for the dutch radars:

What the PR with the 5min floor does for the dutch radars:

What the PR with the 5min floor does for the a 1-min interval radar

Hence - probably it will work - but not with the expected behavior.

peterdesmet commented 2 years ago

I'd like an approach that is generic: we want to reuse it for data that can come in at any resolution.

2 questions:

  1. Is the approach to then always floor? Or is flooring conditional?
  2. Why are we flooring to a resolution (5min) that is not the resolution of the app (10min)? Is that to avoid groups of duplicate timestamps that are too big (and thus many overwrites)?

I'm wondering if an approach would work like:

  1. Always floor
  2. Always take the first match (no overwrites)
stijnvanhoey commented 2 years ago

Is the approach to then always floor? Or is flooring conditional?

When applying this approach, floor is the best idea (cfr. comment Maarten); not conditional.

Why are we flooring to a resolution (5min) that is not the resolution of the app (10min)? Is that to avoid groups of duplicate timestamps that are too big (and thus many overwrites)?

Indeed, redundant overwrites when data comes in at 5min.

I'm wondering if an approach would work like:

  1. Always floor
  2. Always take the first match (no overwrites)

I propose the following: We do the rounding (floor) and in the check on existing datetime, we add an additional check if for that datetime the given vptsDataRow.height contains already data (noData:false). First value is filled in, others are skipped. Doing so, should be ok to have the same rounding as the application setting.

peterdesmet commented 2 years ago

Yes, I like that approach. Easy to explain too: data is floored to resolution of app, first row is selected.

stijnvanhoey commented 2 years ago

Yes, I like that approach. Easy to explain too: data is floored to resolution of app, first row is selected.

Draft implementation done; I'll just have to do some double checks (not sure if this properly handles all the heights) and try to translate this in unit test as well.

stijnvanhoey commented 2 years ago

@peterdesmet while running the app with the new implementation, one other issue in terms of implementation need to be tackled. Let's take the example of the 1min data:

With the (rather naive) current implementation, the value of 2600m will be filled in when passing the line with 2022-10-10 00:01 & 2600m, this is not the first line so with respect to

data is floored to resolution of app, first row is selected.

I should update the implementation so it can handle this situation and always takes the first row (even though this is a Nan)... But due to the early filtering ( cfr. https://github.com/inbo/crow/issues/122#issuecomment-784006559) those nan-records never reach the storeDataRow method, which is great for performance :-) but not to keep track which heights/timestamps have been handled (doing bookkeeping)) on the level of storeDataRow. I continue on the issue, my next approach/proposal is to work in group (based on timestamp) and filtering of the first 'group' on the level of populateDataFromCrowServer instead of tackling it row-by-row later on.

stijnvanhoey commented 2 years ago

@peterdesmet I landed with an implementation that fulfills the requirement:

data is floored to resolution of app, first row is selected.

(note, it is not really 1 row, but N-heights rows, as each row contains 1 height)

See https://github.com/inbo/crow/pull/174/commits/2efeedf2006385f10b15a9f25200bc3c11b67fdb The filterinf of the records is now on earlier level and combined with the Nan-filtering (which overcomes the comment in https://github.com/inbo/crow/issues/170#issuecomment-1281032917). It actually does hurt the performance a bit with the additional groupby, can you have a check if this is still ok?

peterdesmet commented 2 years ago

It actually does hurt the performance a bit with the additional groupby, can you have a check if this is still ok?

Not sure how to benchmark this... you mean locally vs the current online version?

stijnvanhoey commented 2 years ago

Not sure how to benchmark this... you mean locally vs the current online version?

I asked as I'm not sure how you benchmarked this before (I have no experience with vue profiling). Fact is, the operation groups/apply (cfr. map-reduce) the entire dayData object instead of direct filtering, so there will be 'some' slow-down.

peterdesmet commented 2 years ago

I'm guessing it will be ok ... and has the advantage that it is a flexible and correct implementation.

stijnvanhoey commented 2 years ago

@peterdesmet unit tests added, ready for review (& deployment)

peterdesmet commented 2 years ago

174 is now merged and new version of app should be deployed at: https://inbo.github.io/crow/

Todo:

mrtnrey commented 1 year ago

Hi @peterdesmet, any progress on you todo list? Concerning the Dutch radars, the issue seems to be solved, so the first item: :heavy_check_mark: The other things seem to be rather small. Please let me know when we can deploy at meteo.be. Tnx!

stijnvanhoey commented 1 year ago

@mrtnrey All ready to deploy version 1.3 to meteo.be

peterdesmet commented 1 year ago

@mrtnrey You can download the code to deploy at https://github.com/inbo/crow/releases/tag/v1.3

peterdesmet commented 1 year ago

@mrtnrey I notice this issue still remains at meteo.be/birddetection. Is it possible to deploy the latest version of crow there?

mrtnrey commented 1 year ago

Yes, you're right. I asked our IT-dept aleady a few months ago but after that I didn't think about it anymore. Sorry! I'll put it on their agenda again. The datafiles exist, so luckily we do not loose any information.

mrtnrey commented 1 year ago

@peterdesmet release v1.3 deployed this afternoon on meteo.be/birddetection 🥳

peterdesmet commented 1 year ago

Excellent! I'm sure @hvangasteren will be very happy to know that Dutch weather radars can be viewed again on CROW

hvangasteren commented 1 year ago

Hi great sw guys! Yes, this is awesome Thanks Verzonden vanaf mijn Galaxy -------- Oorspronkelijk bericht --------Van: Peter Desmet @.> Datum: 29-03-2023 15:49 (GMT+01:00) Aan: inbo/crow @.> Cc: Hans van Gasteren @.>, Mention @.> Onderwerp: Re: [inbo/crow] Dutch radars: change in nominal timestamp causes visualisation failure (Issue #170) Excellent! I'm sure @hvangasteren will be very happy to know that Dutch weather radars can be viewed again on CROW

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>