noi-techpark / bdp-elaborations

GNU Affero General Public License v3.0
0 stars 4 forks source link

As an air quality expert I would like to update the automatic processing of the low-cost sensors of A22 #27

Closed rcavaliere closed 1 year ago

rcavaliere commented 1 year ago

The current processing formula for NO2 is: NO2=a0+a1NO2raw^2+a2NO2^raw+a3O3raw^0.1+a4Tint^4 The new formula is now: NO2=a0+a1NO2raw^2+a2NO2^raw+*a3O3raw*+a4Tint^4

This is due to the fact that there are new sensors now, which perform better with this kind of calibration. To do list here:

clezag commented 1 year ago

@rcavaliere I've done the requested changes, and fixed some things:

Additional changes:

These changes should prevent future issues when other non-related environment stations get added

rcavaliere commented 1 year ago

@clezag wonderful! Are the elaborations already running and applied on the available measurements? In order to check if the new calculations are done correctly, we need to ensure that the other basic computations that are needed before this non linear calibration are reactivated as well, check https://github.com/noi-techpark/bdp-elaborations/tree/main/environment-a22-non-linear-calibration (readme). In other words the raw values (period = 60) are first elaborated in order to compute the hourly averages (https://github.com/noi-techpark/bdp-elaborations/tree/main/environment-a22-averages), and then this elaboration can start. It seems to me that the hourly averages computation is still not active, can you check this please?

clezag commented 1 year ago

@rcavaliere Still struggling a bit with this issue.

I've seen that the averages elaboration requires a certain amount of data points to be present within the 1h time frames. Currently it's 16 records / h.
If the number of records is not present, the elaboration previously failed, and now I've changed it to ignore and proceed averaging the next hour, so it doesn't get stuck forever

Now both in production and test, the data seems to come in quite irregularly with 5, 10 or even 20 minute intervals between points.
I'm not sure where the issue is exactly, but the data collector being configured to push data every 5 minutes is probably one piece of the puzzle. Stations that now receive data don't have enough records/h, that's why they don't update

Just for testing purposes, I've lowered the number of records / hour to 5 in testing, and some stations now update correctly.

Is this parameter of records/hour something documented and intended? Do we have to look into the data collector / data provider side, since the period is supposed to be 60s?

rcavaliere commented 1 year ago

@clezag yes this control was intended to avoid the elaboration of intervals in which the instrument did not work well, and I would not change this. But the elaboration should be configured in a way, that it should up to the point in which new measurements are available. So, it could happen that certain stations have more uptodate elaborations, and other that stopped at the time in which the last measurements is available. We should ensure that this situation is guaranteed

clezag commented 1 year ago

@rcavaliere Yes, I think I've fixed this now. It should skip periods without data or without sufficient data quality.

Still, all the new stations don't have the necessary update frequency, it seems to be a systemic issue. They update every 5 minutes at best.

I think there is a mistake in the configuration, currently it's set up to sync stations every minute, but data only every 5. Maybe this mistake happened during the migration from Jenkins. Should I switch this around?

rcavaliere commented 1 year ago

@clezag If I understand well, there are issues in how the Data Collector work. Right? It if is the case, try to fix it so that all stations provide correctly real-time data. Please be aware that some sensors (i.e. the ones for which we have updated the calibration coefficients / equation) are at present offline.

clezag commented 1 year ago

@rcavaliere turns out both test and production were using the same credentials, clientID and topic.
This resulted in them snatching away messages from each other and constantly having their sessions closed.

I've configured a different clientId for testing, which means both test and production should now receive each message separately without interference.

I've noticed some other things though:

rcavaliere commented 1 year ago

@clezag ah this could the trigger of all the issues! Yes, the Data Collector write backs also elaborations... but I think they are managing this throwing back one of the two identical messages. Can we change the Data Collector so that it reads our APIs and provide the elaborations in that way, without computing them?

clezag commented 1 year ago

@rcavaliere Just writing down what we've decided in our meeting:

clezag commented 1 year ago

@rcavaliere I think most of the pipeline is working now. The processing elaboration returns 0 for most up to date stations, which happens when the calculated value is below 0. I'm not sure if this is because of outdated parameters or wrong formula. I think we have to verify on a case per case basis. Do you know a station which should be working correctly so I can use that one for my tests?

rcavaliere commented 1 year ago

@clezag thanks for you work in this sprint. Yes, the processing seems not be stable, but this is something BrennerLEC partners are still working on. Currently we do no have a stable situation since the active sensors do not have calibration coefficients calculated for the processing, and the other sensors which are not active because they are being installed again on the highway have the coefficients computed. I would say, let's close this issue and then open another one again if we still need to fix something

clezag commented 1 year ago

Released in production