Closed rcavaliere closed 2 years ago
@rcavaliere Had a look at all DC that produce the wrong PROCESSED measurements with period=600...
select p.data_collector, p.data_collector_version, p.lineage, cname
from measurementhistory m
join station s on m.station_id = s.id
join type t on t.id = m.type_id
join provenance p on p.id = m.provenance_id
where origin = 'a22-algorab'
and cname ~* 'processed'
and period = 600
group by 1, 2, 3, 4;
data_collector |data_collector_version|lineage |cname |
------------------+----------------------+-----------+------------------------+
dc-environment-a22|0.1.0 |a22-algorab|NO2-Alphasense_processed|
dc-environment-a22|0.1.0 |a22-algorab|NO-Alphasense_processed |
dc-environment-a22|0.1.0 |a22-algorab|O3_processed |
dc-environment-a22|0.1.0 |a22-algorab|PM10_processed |
dc-environment-a22|0.1.0 |a22-algorab|PM2.5_processed |
dc-environment-a22|0.2.0 |a22-algorab|NO2-Alphasense_processed|
dc-environment-a22|0.2.0 |a22-algorab|NO-Alphasense_processed |
dc-environment-a22|0.2.0 |a22-algorab|O3_processed |
dc-environment-a22|0.2.0 |a22-algorab|PM10_processed |
dc-environment-a22|0.2.0 |a22-algorab|PM2.5_processed |
dc-environment-a22|0.3.0 |a22-algorab|NO2-Alphasense_processed|
dc-environment-a22|0.3.0 |a22-algorab|NO-Alphasense_processed |
dc-environment-a22|0.3.0 |a22-algorab|O3_processed |
dc-environment-a22|0.3.0 |a22-algorab|PM10_processed |
dc-environment-a22|0.3.0 |a22-algorab|PM2.5_processed |
It seems that only these 3 old DC produced those outputs, all newer data collectors no longer do this. So the problem could be already solved, it is just cleanup that we need to remove the wrongly inserted data from the history and current number measurement tables.
data_collector |data_collector_version |lineage |
-------------------------------+----------------------------------------+-----------+
airquality-elaborations |0.1.0 |a22-algorab|
airquality-elaborations |0.2.0 |a22-algorab|
dc- |1.0.0-SNAPSHOT |a22-algorab|
dc-environment-a22 |0.1.0 |a22-algorab|
dc-environment-a22 |0.2.0 |a22-algorab|
dc-environment-a22 |0.3.0 |a22-algorab|
dc-environment-a22 |1.0.0 |a22-algorab|
odh-mobility-dc-environment-a22|3168ade515fae2bbdda98f72d63353a3b976bf50|a22-algorab|
odh-mobility-dc-environment-a22|67222ff18fc9b1defa9da107580b8459e3753ef6|a22-algorab|
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|
odh-mobility-dc-environment-a22|87bd3e088fa0af90d4a15857b6619883a5ce972f|a22-algorab|
odh-mobility-dc-environment-a22|a82656c7217fffba2135cce667448188983a9efc|a22-algorab|
odh-mobility-dc-environment-a22|ac677de872162035523fc501183741ca22f55432|a22-algorab|
odh-mobility-dc-environment-a22|cf203b707e96e0680526000b5f5c40f6a3742bfe|a22-algorab|
Here the odh-mobility-dc-environment-a22
data collectors are the modern one. Their version is the git sha of the commit. For example, https://github.com/noi-techpark/bdp-commons/commit/cf203b707e96e0680526000b5f5c40f6a3742bfe
I think it is correct to search for DC with origin = a22-algorab to get all the elaboration and data collector services of the schema above, right?
We should check this together, because the pipeline was corrected like this while before was implemented in order to process raw data with period = 600 and then make the average on the period = 3600.
Should we just cleanup the database and get rid of these old records?
@Piiit yes I agree. We should just keep the right elaborations and clean what was not correctly computed.
I will have a look at that on Monday, the whole data collector list including versions, lineage and data types is this:
PERIOD=600
data_collector |data_collector_version |lineage |cname |
-------------------------------+----------------------------------------+-----------+------------------------+
airquality-elaborations |0.2.0 |a22-algorab|NO2-Alphasense_processed|
airquality-elaborations |0.2.0 |a22-algorab|NO-Alphasense_processed |
airquality-elaborations |0.2.0 |a22-algorab|O3_processed |
airquality-elaborations |0.2.0 |a22-algorab|PM10_processed |
airquality-elaborations |0.2.0 |a22-algorab|PM2.5_processed |
dataprocessing-a22-environment |0.1.0 |NOI |NO2-Alphasense_processed|
dataprocessing-a22-environment |0.1.0 |NOI |NO-Alphasense_processed |
dataprocessing-a22-environment |0.1.0 |NOI |O3_processed |
dataprocessing-a22-environment |0.1.0 |NOI |PM10_processed |
dataprocessing-a22-environment |0.1.0 |NOI |PM2.5_processed |
dc- |1.0.0-SNAPSHOT |a22-algorab|NO2_raw |
dc-environment-a22 |0.3.0 |a22-algorab|CO2_raw |
dc-environment-a22 |0.3.0 |a22-algorab|CO_raw |
dc-environment-a22 |0.3.0 |a22-algorab|NO2-Alphasense_processed|
dc-environment-a22 |0.3.0 |a22-algorab|NO2-Alphasense_raw |
dc-environment-a22 |0.3.0 |a22-algorab|NO2-Orion_raw |
dc-environment-a22 |0.3.0 |a22-algorab|NO2_raw |
dc-environment-a22 |0.3.0 |a22-algorab|NO-Alphasense_processed |
dc-environment-a22 |0.3.0 |a22-algorab|NO-Alphasense_raw |
dc-environment-a22 |0.3.0 |a22-algorab|O3_processed |
dc-environment-a22 |0.3.0 |a22-algorab|O3_raw |
dc-environment-a22 |0.3.0 |a22-algorab|PM10_processed |
dc-environment-a22 |0.3.0 |a22-algorab|PM10_raw |
dc-environment-a22 |0.3.0 |a22-algorab|PM2.5_processed |
dc-environment-a22 |0.3.0 |a22-algorab|PM2.5_raw |
dc-environment-a22 |0.3.0 |a22-algorab|RH_raw |
dc-environment-a22 |0.3.0 |a22-algorab|temperature-external_raw|
dc-environment-a22 |0.3.0 |a22-algorab|temperature-internal_raw|
dc-environment-a22 |0.3.0 |a22-algorab|VOC_raw |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|CO2_raw |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|CO_raw |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|NO2-Alphasense_raw |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|NO2-Orion_raw |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|NO2_raw |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|NO-Alphasense_raw |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|O3_raw |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|PM10_raw |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|PM2.5_raw |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|RH_raw |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|temperature-external_raw|
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|temperature-internal_raw|
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|VOC_raw |
...and:
PERIOD=3600
data_collector |data_collector_version|lineage |cname |
------------------------------+----------------------+-----------+------------------------+
airquality-elaborations |0.2.0 |a22-algorab|CO2_raw |
airquality-elaborations |0.2.0 |a22-algorab|CO_raw |
airquality-elaborations |0.2.0 |a22-algorab|NO2-Alphasense_processed|
airquality-elaborations |0.2.0 |a22-algorab|NO2-Alphasense_raw |
airquality-elaborations |0.2.0 |a22-algorab|NO2-Orion_raw |
airquality-elaborations |0.2.0 |a22-algorab|NO2_raw |
airquality-elaborations |0.2.0 |a22-algorab|NO-Alphasense_processed |
airquality-elaborations |0.2.0 |a22-algorab|NO-Alphasense_raw |
airquality-elaborations |0.2.0 |a22-algorab|O3_processed |
airquality-elaborations |0.2.0 |a22-algorab|O3_raw |
airquality-elaborations |0.2.0 |a22-algorab|PM10_processed |
airquality-elaborations |0.2.0 |a22-algorab|PM10_raw |
airquality-elaborations |0.2.0 |a22-algorab|PM2.5_processed |
airquality-elaborations |0.2.0 |a22-algorab|PM2.5_raw |
airquality-elaborations |0.2.0 |a22-algorab|RH_raw |
airquality-elaborations |0.2.0 |a22-algorab|temperature-external_raw|
airquality-elaborations |0.2.0 |a22-algorab|temperature-internal_raw|
airquality-elaborations |0.2.0 |a22-algorab|VOC_raw |
dataprocessing-a22-environment|0.1.0 |NOI |NO2-Alphasense_processed|
dataprocessing-a22-environment|0.1.0 |NOI |NO-Alphasense_processed |
dataprocessing-a22-environment|0.1.0 |NOI |O3_processed |
dataprocessing-a22-environment|0.1.0 |NOI |PM10_processed |
dataprocessing-a22-environment|0.1.0 |NOI |PM2.5_processed |
dc-environment-a22 |0.3.0 |a22-algorab|NO2-Alphasense_processed|
dc-environment-a22 |0.3.0 |a22-algorab|NO-Alphasense_processed |
dc-environment-a22 |0.3.0 |a22-algorab|O3_processed |
dc-environment-a22 |0.3.0 |a22-algorab|PM10_processed |
dc-environment-a22 |0.3.0 |a22-algorab|PM2.5_processed |
Computed by:
select p.data_collector, p.data_collector_version, p.lineage, cname
from measurement m
join station s on m.station_id = s.id
join type t on t.id = m.type_id
join provenance p on p.id = m.provenance_id
where origin = 'a22-algorab'
and cname ~* 'processed'
or cname ~* 'raw'
and period = 600
group by 1, 2, 3, 4
@Piiit thanks for the check. But could it be that the right elaboration tools are the ones with origin = NOI?
@rcavaliere I have no clue, I was hoping you can tell me... We need to check what is what and write some documentation about the whole data flow. Where should we put that?
Yes, please do this. Put in the README of each component what it exactly does in terms of functionalities
@Piiit I have checked a little bit. I understood what made me confusing, this is related to the old elaboration chain, which worked as follows:
raw measurements (about one measurement every minute) -> average on 10-minutes window -> non-linear calibration function (on 10-minutes averages)-> average on 60-minutes window
I confirm that the correct elaboration workflow to be implemented is the following:
raw measurements (about one measurement every minute) -> average on 60-minutes window -> non-linear calibration function (on 60-minutes averages) ->
So we should ensure that the raw measurements provided by the Data Collector have period set different from 600, I would put period = 60 in this case. This also explains also the control on the number of records in the calculation on the 60-minutes averages: in one hour we expect about 60 records, and we defined 16 as the minimum amount of records for the calculation of the average. So I would keep this control.
@rcavaliere I changed the period now to 60, should we also change the old period=600 records to period=60 inside the production DB?
@Piiit yes, would be better. But of course just the relevant data types (with RAW in the description)
@rcavaliere Done, everything in production... please double-check if everything is ok like this. Maybe we should monitor elaborations in the next days to understand if everything is working as expected...
The documentation not perfect. Eventually we could open separate issues for them, but I think you have a better understanding for these parts of the manuals... 1) https://github.com/noi-techpark/bdp-elaborations/tree/main/environment-a22-averages 2) https://github.com/noi-techpark/bdp-elaborations/tree/main/environment-a22-non-linear-calibration 3) https://github.com/noi-techpark/bdp-commons/tree/main/data-collectors/environment-a22
We also have AWS Lambda Github Actions now, but there we still need to do many things to have it properly integrated in our infrastructure:
@Piiit super work! Now everything seems to work as expected. Currently we have few sensors working properly, but in short (aka this year) they will be upgraded.
We need to check the elaboration chain here. The raw data (e.g. NO2-ALPHASENSE_RAW [600]) should be processed in this way:
In other words, we shouldn't have processed values with period = 600 (e.g. NO2-ALPHASENSE_PROCESSED [600])
We should check this together, because the pipeline was corrected like this while before was implemented in order to process raw data with period = 600 and then make the average on the period = 3600.
UPDATE We use now period = 60 as raw data type, but only for the environment-a22 DC...
This elaboration pipeline should be this one: https://github.com/noi-techpark/bdp-elaborations/tree/master/Environment-A22-Processing
TODOS
odh-mobility-dc-environment-a22
=dc-environment-a22
: Connected to the MQTT broker of A22, pushes data from thatairquality-elaborations
dataprocessing-a22-environment
processed period=600
elaboration inside measurementsprocessed period=600 origin=a22-algorab
number measurements from history and recent tables