Closed mrtnrey closed 1 year ago
@niconoe any idea what the cause might be? @stijnvanhoey and I can then attempt to fix it.
I had a look and the 'origin' of the issue is how the 'automatic downsampling' (by filtering out) as defined in https://github.com/inbo/crow/blob/main/src/components/Home.vue#L602 is setup at the momen:
(Note, this makes the approach very effective int terms of performance as half of the data is directly dropped before any further processing).
For example, when the data has a 5min resolution, 00:00, 00:05, 00:10... and the temporal resolution of the app is 10min, the data used for the visualizations is 00:00, 00:10 (dropping 00:05). As dutch radars has data rows 00:01, 00:06, 00:11, no data is retained (and no error is raised).
So, who to fix
function roundNearest(num: number, resolution: number) {
"Round to nearest resolution"
return Math.round(num / resolution) * resolution;
}
datetime: roundNearest(moment.utc(row.substring(0, 13), "YYYYMMDD HHmm").valueOf(), 5 * 60 * 1000),
This is fine if we know that the data will never have a resolution lower than 5minutes(!). It probably will work on 1min data resolution with 10min app resolution, but multiple incoming data records will get the same (rounded) timestamp and the setter would just re-assign each of these which is not ideal...). See https://github.com/inbo/crow/pull/174 for the PR.
storeDataRow
, but not yet about the best approach. E.g. looking for 'closest timestamp' instead of match will have a huge performance loss. Other option is to refactor the whole concept of the prefilled object (with predefined datetimes), but this is a much larger refactoring which we need to check if ok time-wise.Just saw the notification... @stijnvanhoey's analysis seems perfect at first look. Indeed, I'm not sure what would be the best approach without spending time in it but once we think about it, this 5-minutes downsampling seems quite rigid (but it didn't bite us before now). I guess refactoring the storeDataRow
would be ideal, but it's indeed a large change. You're also correct to be concerned about performance: CROW is quite heavy on the browser and I remember having to be careful about that from the beginning.
"data will never have a resolution lower than 5minutes" → I don't think it'll be the case soon, so this solution seems fine to me.
You could alternatively consider a "Floor" function of "Round" to obtain a 5-min timestep. This avoids collision between (rare but possible) late radar file arrivals. See e..g. this hypothetical case With Round timestamp at 00:08 → 00:10 next timestamp at 00:11 → 00:10 With Floor timestamp at 00:08 → 00:05 next timestamp at 00:11 → 00:10 But in practise I don't think this will happen very often so the choice between Floor/Round is not really important.
You could alternatively consider a "Floor" function of "Round" to obtain a 5-min timestep.
Indeed, valid point; I updated the PR, https://github.com/inbo/crow/pull/174
@stijnvanhoey can you clarify with an example why a “that the data will never have a resolution lower than 5minutes” is a requirement?
Otherwise, fine with using/keeping the approach to keep client processing as light as possible.
What currently happens for non-dutch radars:
What currently happens for the dutch radars:
What the PR with the 5min floor does for the dutch radars:
floor(2022-10-10 00:01, 5min)
=2022-10-10 00:00, floor(2022-10-10 00:11, 5min)
=2022-10-10 00:10 are retainedWhat the PR with the 5min floor does for the a 1-min interval radar
floor(2022-10-10 00:00, 5min)
=2022-10-10 00:00, floor(2022-10-10 00:01, 5min)
=2022-10-10 00:00, floor(2022-10-10 00:02, 5min)
=2022-10-10 00:00, floor(2022-10-10 00:03, 5min)
=2022-10-10 00:00, floor(2022-10-10 00:04, 5min)
=2022-10-10 00:00 all match with the timestamp 2022-10-10 00:00; hence the value in the prepared js object at 2022-10-10 00:00 is 4 times overwritten (reset via this loop). Actually timestamp 2022-10-10 00:04 will be used in the app. Expected behavior would be 2022-10-10 00:00 retained and others filtered out.Hence - probably it will work - but not with the expected behavior.
I'd like an approach that is generic: we want to reuse it for data that can come in at any resolution.
2 questions:
I'm wondering if an approach would work like:
Is the approach to then always floor? Or is flooring conditional?
When applying this approach, floor is the best idea (cfr. comment Maarten); not conditional.
Why are we flooring to a resolution (5min) that is not the resolution of the app (10min)? Is that to avoid groups of duplicate timestamps that are too big (and thus many overwrites)?
Indeed, redundant overwrites when data comes in at 5min.
I'm wondering if an approach would work like:
- Always floor
- Always take the first match (no overwrites)
I propose the following: We do the rounding (floor) and in the check on existing datetime, we add an additional check if for that datetime the given vptsDataRow.height
contains already data (noData:false). First value is filled in, others are skipped. Doing so, should be ok to have the same rounding as the application setting.
Yes, I like that approach. Easy to explain too: data is floored to resolution of app, first row is selected.
Yes, I like that approach. Easy to explain too: data is floored to resolution of app, first row is selected.
Draft implementation done; I'll just have to do some double checks (not sure if this properly handles all the heights) and try to translate this in unit test as well.
@peterdesmet while running the app with the new implementation, one other issue in terms of implementation need to be tackled. Let's take the example of the 1min data:
With the (rather naive) current implementation, the value of 2600m will be filled in when passing the line with 2022-10-10 00:01 & 2600m, this is not the first line so with respect to
data is floored to resolution of app, first row is selected.
I should update the implementation so it can handle this situation and always takes the first row (even though this is a Nan)... But due to the early filtering ( cfr. https://github.com/inbo/crow/issues/122#issuecomment-784006559) those nan
-records never reach the storeDataRow
method, which is great for performance :-) but not to keep track which heights/timestamps have been handled (doing bookkeeping)) on the level of storeDataRow
. I continue on the issue, my next approach/proposal is to work in group (based on timestamp) and filtering of the first 'group' on the level of populateDataFromCrowServer instead of tackling it row-by-row later on.
@peterdesmet I landed with an implementation that fulfills the requirement:
data is floored to resolution of app, first row is selected.
(note, it is not really 1 row, but N-heights rows, as each row contains 1 height)
See https://github.com/inbo/crow/pull/174/commits/2efeedf2006385f10b15a9f25200bc3c11b67fdb The filterinf of the records is now on earlier level and combined with the Nan-filtering (which overcomes the comment in https://github.com/inbo/crow/issues/170#issuecomment-1281032917). It actually does hurt the performance a bit with the additional groupby, can you have a check if this is still ok?
It actually does hurt the performance a bit with the additional groupby, can you have a check if this is still ok?
Not sure how to benchmark this... you mean locally vs the current online version?
Not sure how to benchmark this... you mean locally vs the current online version?
I asked as I'm not sure how you benchmarked this before (I have no experience with vue profiling). Fact is, the operation groups/apply (cfr. map-reduce) the entire dayData
object instead of direct filtering, so there will be 'some' slow-down.
I'm guessing it will be ok ... and has the advantage that it is a flexible and correct implementation.
@peterdesmet unit tests added, ready for review (& deployment)
Todo:
Albis
entirely (this was a test and should not be displayed on meteo.be/birddetection) see https://github.com/inbo/crow/blob/ab31a8cb1dedbea3802b2e069012f249e040d75a/src/config.ts#L53-L65Hi @peterdesmet, any progress on you todo list? Concerning the Dutch radars, the issue seems to be solved, so the first item: :heavy_check_mark: The other things seem to be rather small. Please let me know when we can deploy at meteo.be. Tnx!
@mrtnrey All ready to deploy version 1.3 to meteo.be
@mrtnrey You can download the code to deploy at https://github.com/inbo/crow/releases/tag/v1.3
@mrtnrey I notice this issue still remains at meteo.be/birddetection. Is it possible to deploy the latest version of crow there?
Yes, you're right. I asked our IT-dept aleady a few months ago but after that I didn't think about it anymore. Sorry! I'll put it on their agenda again. The datafiles exist, so luckily we do not loose any information.
@peterdesmet release v1.3 deployed this afternoon on meteo.be/birddetection 🥳
Excellent! I'm sure @hvangasteren will be very happy to know that Dutch weather radars can be viewed again on CROW
Hi great sw guys! Yes, this is awesome Thanks Verzonden vanaf mijn Galaxy -------- Oorspronkelijk bericht --------Van: Peter Desmet @.> Datum: 29-03-2023 15:49 (GMT+01:00) Aan: inbo/crow @.> Cc: Hans van Gasteren @.>, Mention @.> Onderwerp: Re: [inbo/crow] Dutch radars: change in nominal timestamp causes visualisation failure (Issue #170) Excellent! I'm sure @hvangasteren will be very happy to know that Dutch weather radars can be viewed again on CROW
—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>
The Dutch radars (Herwijnen=NLHRW, Den Helder=NLDHL) are unavailable in the visualisation since 2022-08-29. However, the source data are produced; see e.g. https://opendata.meteo.be/ftp/observations/radar/vbird/nlhrw/2022/
A quick check revealed the cause: the nominal timestamp of the Dutch radar shifted one minute, i.e.: HH:05 became HH:06 HH:10 became HH:11 See the source file https://opendata.meteo.be/ftp/observations/radar/vbird/nlhrw/2022/nlhrw_vpts_20220829.txt at around 11:10 UTC that day.
As a consequence, the CROW visualisation fails...
The timestamp is read from the source HDF radar file so a quick fix on the side of the vpts generation is not possible imo.
@peterdesmet any ideas to implement a fix? Tnx!