noi-techpark / bdp-elaborations

Open Data Hub time series elaborations
GNU Affero General Public License v3.0
0 stars 4 forks source link

As University of Trento I would like that the non linear calibration formula + calibration values of the air quality sensors on the A22 are updated #32

Open rcavaliere opened 7 months ago

rcavaliere commented 7 months ago

Following elaboration should be improved: https://github.com/noi-techpark/bdp-elaborations/tree/main/environment-a22-non-linear-calibration

according to new calibration values + formula, as provided in this attachment

240402_UNITN_Airqino_coefficients_winter_2023-2024.xlsx

We should delete all "historical" elaborations, i.e. consider the installations that have been done recently (see attachment 240326_PosizioneSensori.xlsx. E.g. AQ12 - delete all elaborations after 11.03.2024

clezag commented 7 months ago

@rcavaliere I've just updated the parameter map in testing (it's not running yet for some other reason). From what I understood, it's just like #27, only that now all stations use the new formula for the NO2 calculations. All other calculations (including NO - which uses the old formula) stay as they were.

Regarding deleting the data, I assume we just delete the data type NO2_alphasense and leave the rest, right?

rcavaliere commented 7 months ago

@clezag exactly! Only those with period = 3600

clezag commented 7 months ago

@rcavaliere I've tried in testing, but the elaboration is now failing because the new stations (14 through 23) are missing the parameters for the other calculations (NO, O3 etc.).
Do we have those somewhere? Otherwise I would modify the elaboration so that it simply ignores datatypes it doesn't have parameters for.

Here's the script I've used to delete the old data, just for future reference:


delete from measurementhistory where id in (
select m.id from measurementhistory m
join type t on t.cname = 'NO2-Alphasense_processed' and m.type_id =t.id
join station s on s.id = m.station_id
where s.stationcode in
(
'AUGEG4_AIRQ01',
'AUGEG4_AIRQ02',
'AUGEG4_AIRQ03',
'AUGEG4_AIRQ04',
'AUGEG4_AIRQ05',
'AUGEG4_AIRQ06',
'AUGEG4_AIRQ07',
'AUGEG4_AIRQ08',
'AUGEG4_AIRQ09',
'AUGEG4_AIRQ10',
'AUGEG4_AIRQ11',
'AUGEG4_AIRQ12',
'AUGEG4_AIRQ13',
'AUGEG4_AIRQ14',
'AUGEG4_AIRQ15',
'AUGEG4_AIRQ16',
'AUGEG4_AIRQ17',
'AUGEG4_AIRQ18',
'AUGEG4_AIRQ19',
'AUGEG4_AIRQ20',
'AUGEG4_AIRQ21',
'AUGEG4_AIRQ22',
'AUGEG4_AIRQ23')
and m.period = 3600
and m.timestamp > to_date('202403110000', 'YYYYMMDDHH24MI'));

update measurement m
  set double_value = t.double_value, timestamp = t.timestamp 
from (
select distinct on (m.station_id, m.type_id, m.period)  m.station_id, m.type_id, m.period, m.double_value, m.timestamp
  from measurementhistory m
join type t on t.cname = 'NO2-Alphasense_processed' and m.type_id =t.id
join station s on s.id = m.station_id
where s.stationcode in
(
'AUGEG4_AIRQ01',
'AUGEG4_AIRQ02',
'AUGEG4_AIRQ03',
'AUGEG4_AIRQ04',
'AUGEG4_AIRQ05',
'AUGEG4_AIRQ06',
'AUGEG4_AIRQ07',
'AUGEG4_AIRQ08',
'AUGEG4_AIRQ09',
'AUGEG4_AIRQ10',
'AUGEG4_AIRQ11',
'AUGEG4_AIRQ12',
'AUGEG4_AIRQ13',
'AUGEG4_AIRQ14',
'AUGEG4_AIRQ15',
'AUGEG4_AIRQ16',
'AUGEG4_AIRQ17',
'AUGEG4_AIRQ18',
'AUGEG4_AIRQ19',
'AUGEG4_AIRQ20',
'AUGEG4_AIRQ21',
'AUGEG4_AIRQ22',
'AUGEG4_AIRQ23')
and m.period = 3600
order by m.station_id, m.type_id, m.period, timestamp desc) t
where m.station_id = t.station_id and m.type_id = t.type_id and m.period = t.period;
rcavaliere commented 7 months ago

@clezag thanks for your work! Yes, for these 14 stations the calibration curves for the other pollutants still need to be calculated, so please implement the "ignore" logic.

clezag commented 6 months ago

@rcavaliere done in testing now, seems good to me

rcavaliere commented 6 months ago

@clezag wonderful, I had a look, the values are absolutely reasonable. I made some comparison with reference averages made available by the reference air quality stations and are perfectly in line. Let's go in production!

clezag commented 6 months ago

@rcavaliere up in production

rcavaliere commented 6 months ago

@clezag the elaborations have been tested by the BrennerLEC project partners. Everything OK, but it has been observed that in the processed values there are many '0' caused by the fact the temperatures are already high, and in these conditions other calibration curves should be applied. In the code we should already have this implemented in that way. The request is therefore to add also the "summer calibration curves" as in the attached file. Please note that these are present only for a subset of the sensor, the other will remain as they are. I would suggest to delete the processed values for these specific sensors up to the point in which they were installed (see file PosizioneSensori in the main user story description.

240509_UNITN_Airqino_coefficients_summer_2023.xlsx

clezag commented 6 months ago

@rcavaliere done in testing

rcavaliere commented 6 months ago

Very good, let's go in production!

rcavaliere commented 6 months ago

@clezag sorry I didn't tag you here... do you want to do it today or let's wait for Tuesday next week?

clezag commented 6 months ago

@rcavaliere will do it next week

clezag commented 6 months ago

@rcavaliere moved into production

I've noticed that all the stations I checked where stuck on the 14.05.2024 and didn't get any new data (at the raw data mqtt side). We've had an internal issue around that time that affected most data collectors, and unfortunately this data collector seems to be written in a way where it wasn't able to recover, but also didn't report any errors or crash.

I've now restarted it and it seems to be receiving data again, so it will take a moment for the elaboration pipeline to catch up.

rcavaliere commented 6 months ago

OK, I leave the issue open in case we get some other comments from the BrennerLEC partners, internally we can consider the issue closed for the moment

rcavaliere commented 6 months ago

@clezag the calibration of the sensor AQ12 ("Stazione_KM012-200 (AUGEG4_AIRQ12)" seems to be wrong, the elaboration is nearly always zero. Can you check if you applied it correctly and let me know?

clezag commented 6 months ago

@rcavaliere I've checked the parameters themselves, and did a few test calculation by hand, in my opinion the elaboration does what it's supposed to do.

For the next update, would it be possible to have them submit a pull request directly on the csv? Or if they are not technically inclined, at least mail us the full updated csv. There is a lot of margin for error when copying stuff together from various excel files with different number formats etc.

rcavaliere commented 5 months ago

@clezag did this PR went into production? https://github.com/noi-techpark/bdp-elaborations/actions/runs/9357205755 The sensor is still returning incorrect values, let me know if we still need to fix something here.

Another request: can you reactivate please the function that provides our elaboration back to the MQTT server of A22?

clezag commented 5 months ago

@rcavaliere sorry I did not see this commit. I've merged it into production and re-elaborated the data. Now it looks plausible

rcavaliere commented 5 months ago

Perfect! Now we just miss to reactivate the service towards the MQTT server and we can close this ticket

clezag commented 5 months ago

@rcavaliere I don't think we can simply switch it on, all the changes we made to the formulas and calibration parameters were done on our elaboration, which works on the hourly average data and is not sending back to MQTT.

The MQTT connection however is in the data collector, and we will have to make all the same changes to formulas and parameters there too (because it does the same calculation, but on the real time data) before switching it back on.

I would also look at implementing this at the elaboration side, so in the future we can avoid maintaining two codebases who effectively do the same thing

Please correct me if I'm wrong, this is what I understood just from looking at the code.

rcavaliere commented 5 months ago

@clezag, yeah you are right, it is not so immediate. But can't you activate the connection to the MQTT broker from the elaboration module?

clezag commented 5 months ago

@rcavaliere I'm afraid no. The elaboration has been rewritten from the ground up (in python vs the original Java), so the MQTT part would have to be implemented from scratch. In addition, the elaboration runs once every hour and works on hourly aggregated data only. The data collector worked a 60 second period on raw data. So a good part of the logic would have to be changed.

Just updating the formula+parameters of the data collector and switching the MQTT back on will be easier, it's just that from now on, someone has to remember and update both applications every time something changes

rcavaliere commented 5 months ago

@clezag ok, let's discuss this together. We could think to ask our counterpart to foresee something different at the MQTT broker side, in order to receive hourly elaborations...

rcavaliere commented 5 months ago

@clezag I think that in relation to the topic on how to best provide the elaborations outputs to the MQTT server the best way to proceed would be to implement this logic at the level of the elaboration module. In this way we completely decouple the data collection and data elaboration tasks. I.e., when elaborations are computed they are sent back to the MQTT server. For the other side this approach would be OK, in the sense that we could completely differentiate the mechanisms of data reading / writing with the MQTT server

rcavaliere commented 2 months ago

@clezag can you test the interface with the MQTT server using the new IP addresses you shared me? A22 should have given us grant to connect. Please test and let me know

clezag commented 2 months ago

@rcavaliere It's still not connecting from kubernetes. I've checked from both test and production

clezag commented 2 months ago

@rcavaliere It's fixed now, and I'm receiving data. Thanks for your work!