noi-techpark / bdp-elaborations

Open Data Hub time series elaborations
GNU Affero General Public License v3.0
0 stars 4 forks source link

As a open data hub maintainer I would like to fix the excessive API calls of the pollution elaboration to lighten the load on our mobility API #25

Closed clezag closed 1 year ago

clezag commented 1 year ago

There has been a sharp increase (>10x) in measurement history requests against the ninja API since the pollution elaboration has been released mid february 2023. This is probably due to the way the code behaves in case of stations that once had data, but are not being updated anymore. There is a static blacklist.txt file in the repo that could be used to exclude these stations, but this will require future manual updates every time a station goes offline.

I suggest we implement a mechanism that detects these cases automatically and reduces the amount of calls for them to a necessary amount.

clezag commented 1 year ago

An attempt to fix this is now online:

If a station is about to compute a time range larger than one batch window (currently 7 days), a request for latest available data is made first (measurement instead of measurementhistory). The timestamp of the latest traffic data available is then set as the max date of the traffic data request range.

In other words: If more than one history request would be necessary, the requested date range is: [last_pollution_data_timestamp - last_traffic_data_timestamp] where previously it was: [last_pollution_data_timestamp - current_date]

If a station is defunct and does not get any data anymore, the end date is < start date, and thus no further data requests or computations will be performed.

clezag commented 1 year ago

@rcavaliere This fix is now in production (not really possible in testing because the data isn't there). I've checked the logs a few times and it seems fine now, but just so you know there has been a change in the elaboration, in case you should notice anything suspicious

rcavaliere commented 1 year ago

@clezag thanks for the info, the calculation seems to work fine, as you can see here: https://analytics.opendatahub.com/#%7B%22active_tab%22:0,%22height%22:%22400px%22,%22auto_refresh%22:false,%22scale%22:%7B%22from%22:1680300000000,%22to%22:1684015200000%7D,%22graphs%22:%5B%7B%22category%22:%22Traffic%22,%22station%22:%22A22:5687:3%22,%22station_name%22:%22SEZIONE%20DI%20RILEVAMENTO%20KM.%20103+700%20-%20EGNA%20ORA%20(corsia%20di%20marcia%20sud,%20direzione%20sud)%22,%22data_type%22:%22LIGHT_VEHICLES-CO2-emissions%22,%22unit%22:%22g/km%22,%22period%22:%22600%22,%22yaxis%22:1,%22color%22:3%7D%5D%7D

On the other side, could it be that there some stations for which the elaboration is no more update? For example station SEZIONE DI RILEVAMENTO KM. 107,0 - S.FLORIANO (corsia di sorpasso nord, direzione nord)

clezag commented 1 year ago

@rcavaliere good spot! I will look into this

clezag commented 1 year ago

@rcavaliere there was a bug that prevented stations with missing periods of data to load correctly. I've pushed a fix for this in production and the elaboration retroactively calculated the missing time periods

Thanks for reporting the issue!