noi-techpark / bdp-elaborations

GNU Affero General Public License v3.0
0 stars 4 forks source link

As a traffic analyst I would like to understand if it is possible to have the Bluetooth elaborations (LinkStations) in a more real-time way #13

Closed rcavaliere closed 1 year ago

rcavaliere commented 1 year ago

It is related to this elaboration: https://github.com/noi-techpark/bdp-elaborations/tree/main/bluetooth-traffic

The task to be done is similar to https://github.com/noi-techpark/bdp-commons/issues/568, at present the elaborations have a delay of hours which makes this information not usable for real-time applications

rcavaliere commented 1 year ago

@clezag here the paper with some explanations on the algorithms Paper.pdf

clezag commented 1 year ago

An update on the situation: The current elaboration is not set up in a way that allows for actual real time data.

The job runs every hour at exactly xx:00 and then recalculates all the time frames (10min, 30min etc.) of the last 24 hours. This takes a good while, so much in fact, that sometimes the job takes more than 1 hour and skips it's next run, leading to even more out-of-date data.

We are currently testing some query performance optimizations that should bring elaboration time down to something more reasonable (~20 minutes).

While this will ameliorate the issue somewhat , it still doesn't produce real-time data, as it will still always be out of date by 20-60 minutes. To get actual real-time, we would have to rework the elaboration in a major way. I would suggest discussing our options in person, if we want to go that route, as there are many different levels of compromise we can choose from.

rcavaliere commented 1 year ago

Next steps consolidated with @clezag and @dulvui:

  1. scheduler set up (no more set as 1 hour, but aligned to the different scheduling tasks, e.g. elaborations on 600 seconds should be computed every 10 minutes)
  2. interval window change (scheduling tasks with low elaboration window should not go too much in the past), this value should be a parameter for the scheduler
clezag commented 1 year ago

@rcavaliere The index and query optimizations are now live in production

clezag commented 1 year ago

Seems like we have a never ending story on our hands. Now that the performance fix is in production (and works - we're from 1h+ down to 25min), I've noticed that some stations are still very much out of date (like 2+h). Turns out that most of the time we don't get any data for a bluetooth box for quite a long time, and are then sent the history up to that point all at once.

If the job doesn't find any data for a period, it doesn't generate the measurement. For example if the last record is at 06:30, that node will be stuck on that time even though the elaboration ran without problems at 10:00.

Updates come in every 10 minutes, but they only include a few stations at a time (~10). I don't see any pattern as to which ones get updated more often or when. Maybe there is some issue at the data provider or with the network to the bluetooth stations?

@rcavaliere Since you know the project quite well, do you have a possible explanation from the data provider side?

I'd say if we don't get this sorted out first, there is little value in evaluating changes to the elaboration / scheduler.

rcavaliere commented 1 year ago

@clezag I suggest to close this issue. We made already a relevant improvement, further developments should be evaluated out this user story. Thanks a lot for your work!