As an air quality expert I would like to update the automatic processing of the low-cost sensors of A22

rcavaliere commented 1 year ago

The current processing formula for NO2 is: NO2=a0+a1NO2raw^2+a2NO2^raw+a3O3raw^0.1+a4Tint^4 The new formula is now: NO2=a0+a1NO2raw^2+a2NO2^raw+*a3O3raw*+a4Tint^4

This is due to the fact that there are new sensors now, which perform better with this kind of calibration. To do list here:

[x] correction of processing formula
[x] update of calibration coefficients (see attached file 230531_Airqino_coefficients.xlsx, only for the given stations, to be done here: https://github.com/noi-techpark/bdp-elaborations/blob/main/environment-a22-non-linear-calibration/src/processorParameters.csv
[x] reactivation of the automatic elaboration pipeline

clezag commented 1 year ago

@rcavaliere I've done the requested changes, and fixed some things:

implemented new processing formula. There is a new column 'L' in the CSV, where if 1, it uses the old formula, else the new one. This way, stations that didn't get the new coefficients still work. We can remove this if you want.
updated coefficients in the CSV
Elaboration is now running in testing

Additional changes:

The CSV is now loaded automatically, there is no need to generate the JSON anymore
Now specifically only requests stations that are present in the CSV file
Now specifically only requests data types that are needed by the elaboration
Reworked algorithm that determines time windows of historical (raw) data requests

These changes should prevent future issues when other non-related environment stations get added

rcavaliere commented 1 year ago

@clezag wonderful! Are the elaborations already running and applied on the available measurements? In order to check if the new calculations are done correctly, we need to ensure that the other basic computations that are needed before this non linear calibration are reactivated as well, check https://github.com/noi-techpark/bdp-elaborations/tree/main/environment-a22-non-linear-calibration (readme). In other words the raw values (period = 60) are first elaborated in order to compute the hourly averages (https://github.com/noi-techpark/bdp-elaborations/tree/main/environment-a22-averages), and then this elaboration can start. It seems to me that the hourly averages computation is still not active, can you check this please?

clezag commented 1 year ago

@rcavaliere Still struggling a bit with this issue.

I've seen that the averages elaboration requires a certain amount of data points to be present within the 1h time frames. Currently it's 16 records / h.
If the number of records is not present, the elaboration previously failed, and now I've changed it to ignore and proceed averaging the next hour, so it doesn't get stuck forever

Now both in production and test, the data seems to come in quite irregularly with 5, 10 or even 20 minute intervals between points.
I'm not sure where the issue is exactly, but the data collector being configured to push data every 5 minutes is probably one piece of the puzzle. Stations that now receive data don't have enough records/h, that's why they don't update

Just for testing purposes, I've lowered the number of records / hour to 5 in testing, and some stations now update correctly.

Is this parameter of records/hour something documented and intended? Do we have to look into the data collector / data provider side, since the period is supposed to be 60s?

rcavaliere commented 1 year ago

@clezag yes this control was intended to avoid the elaboration of intervals in which the instrument did not work well, and I would not change this. But the elaboration should be configured in a way, that it should up to the point in which new measurements are available. So, it could happen that certain stations have more uptodate elaborations, and other that stopped at the time in which the last measurements is available. We should ensure that this situation is guaranteed

clezag commented 1 year ago

@rcavaliere Yes, I think I've fixed this now. It should skip periods without data or without sufficient data quality.

Still, all the new stations don't have the necessary update frequency, it seems to be a systemic issue. They update every 5 minutes at best.

I think there is a mistake in the configuration, currently it's set up to sync stations every minute, but data only every 5. Maybe this mistake happened during the migration from Jenkins. Should I switch this around?

rcavaliere commented 1 year ago

@clezag If I understand well, there are issues in how the Data Collector work. Right? It if is the case, try to fix it so that all stations provide correctly real-time data. Please be aware that some sensors (i.e. the ones for which we have updated the calibration coefficients / equation) are at present offline.

clezag commented 1 year ago

@rcavaliere turns out both test and production were using the same credentials, clientID and topic.
This resulted in them snatching away messages from each other and constantly having their sessions closed.

I've configured a different clientId for testing, which means both test and production should now receive each message separately without interference.

I've noticed some other things though:

The application not only collects data, but also sends back to AUGE via MQTT. Both test and production point to the same server, so I think we're publishing everything twice.
The data collector still does the same calculations that the elaboration does. The results are published both to ODH and AUGE. The elaboration only looks at the 1h average data, the 1min data is done directly by the data collector. Is this correct? Should we change this?

rcavaliere commented 1 year ago

@clezag ah this could the trigger of all the issues! Yes, the Data Collector write backs also elaborations... but I think they are managing this throwing back one of the two identical messages. Can we change the Data Collector so that it reads our APIs and provide the elaborations in that way, without computing them?

clezag commented 1 year ago

@rcavaliere Just writing down what we've decided in our meeting:

[x] lower required datapoints / h in average elaboration to 5
[x] modify data collector so that it only collects raw data, disable push of processed data to ODH or AUGE

clezag commented 1 year ago

@rcavaliere I think most of the pipeline is working now. The processing elaboration returns 0 for most up to date stations, which happens when the calculated value is below 0. I'm not sure if this is because of outdated parameters or wrong formula. I think we have to verify on a case per case basis. Do you know a station which should be working correctly so I can use that one for my tests?

rcavaliere commented 1 year ago

@clezag thanks for you work in this sprint. Yes, the processing seems not be stable, but this is something BrennerLEC partners are still working on. Currently we do no have a stable situation since the active sensors do not have calibration coefficients calculated for the processing, and the other sensors which are not active because they are being installed again on the highway have the coefficients computed. I would say, let's close this issue and then open another one again if we still need to fix something

clezag commented 1 year ago

Released in production

noi-techpark / bdp-elaborations

As an air quality expert I would like to update the automatic processing of the low-cost sensors of A22 #27