noi-techpark / bdp-commons

GNU Affero General Public License v3.0
2 stars 12 forks source link

A22 Environment Data should be elaborated differently #508

Closed rcavaliere closed 2 years ago

rcavaliere commented 2 years ago

We need to check the elaboration chain here. The raw data (e.g. NO2-ALPHASENSE_RAW [600]) should be processed in this way:

In other words, we shouldn't have processed values with period = 600 (e.g. NO2-ALPHASENSE_PROCESSED [600])

We should check this together, because the pipeline was corrected like this while before was implemented in order to process raw data with period = 600 and then make the average on the period = 3600.

UPDATE We use now period = 60 as raw data type, but only for the environment-a22 DC...

This elaboration pipeline should be this one: https://github.com/noi-techpark/bdp-elaborations/tree/master/Environment-A22-Processing

TODOS

Piiit commented 2 years ago

photo5796666890259643185

Piiit commented 2 years ago

@rcavaliere Had a look at all DC that produce the wrong PROCESSED measurements with period=600...

select p.data_collector, p.data_collector_version, p.lineage, cname 
from measurementhistory m  
join station s on m.station_id = s.id
join type t on t.id = m.type_id 
join provenance p on p.id = m.provenance_id 
where origin = 'a22-algorab' 
and cname ~* 'processed'
and period = 600
group by 1, 2, 3, 4;

processed 600

data_collector    |data_collector_version|lineage    |cname                   |
------------------+----------------------+-----------+------------------------+
dc-environment-a22|0.1.0                 |a22-algorab|NO2-Alphasense_processed|
dc-environment-a22|0.1.0                 |a22-algorab|NO-Alphasense_processed |
dc-environment-a22|0.1.0                 |a22-algorab|O3_processed            |
dc-environment-a22|0.1.0                 |a22-algorab|PM10_processed          |
dc-environment-a22|0.1.0                 |a22-algorab|PM2.5_processed         |

dc-environment-a22|0.2.0                 |a22-algorab|NO2-Alphasense_processed|
dc-environment-a22|0.2.0                 |a22-algorab|NO-Alphasense_processed |
dc-environment-a22|0.2.0                 |a22-algorab|O3_processed            |
dc-environment-a22|0.2.0                 |a22-algorab|PM10_processed          |
dc-environment-a22|0.2.0                 |a22-algorab|PM2.5_processed         |

dc-environment-a22|0.3.0                 |a22-algorab|NO2-Alphasense_processed|
dc-environment-a22|0.3.0                 |a22-algorab|NO-Alphasense_processed |
dc-environment-a22|0.3.0                 |a22-algorab|O3_processed            |
dc-environment-a22|0.3.0                 |a22-algorab|PM10_processed          |
dc-environment-a22|0.3.0                 |a22-algorab|PM2.5_processed         |

It seems that only these 3 old DC produced those outputs, all newer data collectors no longer do this. So the problem could be already solved, it is just cleanup that we need to remove the wrongly inserted data from the history and current number measurement tables.

all data collectors with origin = a22-algorab

data_collector                 |data_collector_version                  |lineage    |
-------------------------------+----------------------------------------+-----------+
airquality-elaborations        |0.1.0                                   |a22-algorab|
airquality-elaborations        |0.2.0                                   |a22-algorab|
dc-                            |1.0.0-SNAPSHOT                          |a22-algorab|
dc-environment-a22             |0.1.0                                   |a22-algorab|
dc-environment-a22             |0.2.0                                   |a22-algorab|
dc-environment-a22             |0.3.0                                   |a22-algorab|
dc-environment-a22             |1.0.0                                   |a22-algorab|
odh-mobility-dc-environment-a22|3168ade515fae2bbdda98f72d63353a3b976bf50|a22-algorab|
odh-mobility-dc-environment-a22|67222ff18fc9b1defa9da107580b8459e3753ef6|a22-algorab|
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|
odh-mobility-dc-environment-a22|87bd3e088fa0af90d4a15857b6619883a5ce972f|a22-algorab|
odh-mobility-dc-environment-a22|a82656c7217fffba2135cce667448188983a9efc|a22-algorab|
odh-mobility-dc-environment-a22|ac677de872162035523fc501183741ca22f55432|a22-algorab|
odh-mobility-dc-environment-a22|cf203b707e96e0680526000b5f5c40f6a3742bfe|a22-algorab|

Here the odh-mobility-dc-environment-a22 data collectors are the modern one. Their version is the git sha of the commit. For example, https://github.com/noi-techpark/bdp-commons/commit/cf203b707e96e0680526000b5f5c40f6a3742bfe

I think it is correct to search for DC with origin = a22-algorab to get all the elaboration and data collector services of the schema above, right?

We should check this together, because the pipeline was corrected like this while before was implemented in order to process raw data with period = 600 and then make the average on the period = 3600.

Should we just cleanup the database and get rid of these old records?

rcavaliere commented 2 years ago

@Piiit yes I agree. We should just keep the right elaborations and clean what was not correctly computed.

Piiit commented 2 years ago

I will have a look at that on Monday, the whole data collector list including versions, lineage and data types is this:

PERIOD=600

data_collector                 |data_collector_version                  |lineage    |cname                   |
-------------------------------+----------------------------------------+-----------+------------------------+
airquality-elaborations        |0.2.0                                   |a22-algorab|NO2-Alphasense_processed|
airquality-elaborations        |0.2.0                                   |a22-algorab|NO-Alphasense_processed |
airquality-elaborations        |0.2.0                                   |a22-algorab|O3_processed            |
airquality-elaborations        |0.2.0                                   |a22-algorab|PM10_processed          |
airquality-elaborations        |0.2.0                                   |a22-algorab|PM2.5_processed         |
dataprocessing-a22-environment |0.1.0                                   |NOI        |NO2-Alphasense_processed|
dataprocessing-a22-environment |0.1.0                                   |NOI        |NO-Alphasense_processed |
dataprocessing-a22-environment |0.1.0                                   |NOI        |O3_processed            |
dataprocessing-a22-environment |0.1.0                                   |NOI        |PM10_processed          |
dataprocessing-a22-environment |0.1.0                                   |NOI        |PM2.5_processed         |
dc-                            |1.0.0-SNAPSHOT                          |a22-algorab|NO2_raw                 |
dc-environment-a22             |0.3.0                                   |a22-algorab|CO2_raw                 |
dc-environment-a22             |0.3.0                                   |a22-algorab|CO_raw                  |
dc-environment-a22             |0.3.0                                   |a22-algorab|NO2-Alphasense_processed|
dc-environment-a22             |0.3.0                                   |a22-algorab|NO2-Alphasense_raw      |
dc-environment-a22             |0.3.0                                   |a22-algorab|NO2-Orion_raw           |
dc-environment-a22             |0.3.0                                   |a22-algorab|NO2_raw                 |
dc-environment-a22             |0.3.0                                   |a22-algorab|NO-Alphasense_processed |
dc-environment-a22             |0.3.0                                   |a22-algorab|NO-Alphasense_raw       |
dc-environment-a22             |0.3.0                                   |a22-algorab|O3_processed            |
dc-environment-a22             |0.3.0                                   |a22-algorab|O3_raw                  |
dc-environment-a22             |0.3.0                                   |a22-algorab|PM10_processed          |
dc-environment-a22             |0.3.0                                   |a22-algorab|PM10_raw                |
dc-environment-a22             |0.3.0                                   |a22-algorab|PM2.5_processed         |
dc-environment-a22             |0.3.0                                   |a22-algorab|PM2.5_raw               |
dc-environment-a22             |0.3.0                                   |a22-algorab|RH_raw                  |
dc-environment-a22             |0.3.0                                   |a22-algorab|temperature-external_raw|
dc-environment-a22             |0.3.0                                   |a22-algorab|temperature-internal_raw|
dc-environment-a22             |0.3.0                                   |a22-algorab|VOC_raw                 |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|CO2_raw                 |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|CO_raw                  |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|NO2-Alphasense_raw      |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|NO2-Orion_raw           |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|NO2_raw                 |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|NO-Alphasense_raw       |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|O3_raw                  |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|PM10_raw                |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|PM2.5_raw               |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|RH_raw                  |
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|temperature-external_raw|
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|temperature-internal_raw|
odh-mobility-dc-environment-a22|6c4d290a17ccf1e9ca1f970d073c162b7473a2ad|a22-algorab|VOC_raw                 |

...and:

PERIOD=3600

data_collector                |data_collector_version|lineage    |cname                   |
------------------------------+----------------------+-----------+------------------------+
airquality-elaborations       |0.2.0                 |a22-algorab|CO2_raw                 |
airquality-elaborations       |0.2.0                 |a22-algorab|CO_raw                  |
airquality-elaborations       |0.2.0                 |a22-algorab|NO2-Alphasense_processed|
airquality-elaborations       |0.2.0                 |a22-algorab|NO2-Alphasense_raw      |
airquality-elaborations       |0.2.0                 |a22-algorab|NO2-Orion_raw           |
airquality-elaborations       |0.2.0                 |a22-algorab|NO2_raw                 |
airquality-elaborations       |0.2.0                 |a22-algorab|NO-Alphasense_processed |
airquality-elaborations       |0.2.0                 |a22-algorab|NO-Alphasense_raw       |
airquality-elaborations       |0.2.0                 |a22-algorab|O3_processed            |
airquality-elaborations       |0.2.0                 |a22-algorab|O3_raw                  |
airquality-elaborations       |0.2.0                 |a22-algorab|PM10_processed          |
airquality-elaborations       |0.2.0                 |a22-algorab|PM10_raw                |
airquality-elaborations       |0.2.0                 |a22-algorab|PM2.5_processed         |
airquality-elaborations       |0.2.0                 |a22-algorab|PM2.5_raw               |
airquality-elaborations       |0.2.0                 |a22-algorab|RH_raw                  |
airquality-elaborations       |0.2.0                 |a22-algorab|temperature-external_raw|
airquality-elaborations       |0.2.0                 |a22-algorab|temperature-internal_raw|
airquality-elaborations       |0.2.0                 |a22-algorab|VOC_raw                 |
dataprocessing-a22-environment|0.1.0                 |NOI        |NO2-Alphasense_processed|
dataprocessing-a22-environment|0.1.0                 |NOI        |NO-Alphasense_processed |
dataprocessing-a22-environment|0.1.0                 |NOI        |O3_processed            |
dataprocessing-a22-environment|0.1.0                 |NOI        |PM10_processed          |
dataprocessing-a22-environment|0.1.0                 |NOI        |PM2.5_processed         |
dc-environment-a22            |0.3.0                 |a22-algorab|NO2-Alphasense_processed|
dc-environment-a22            |0.3.0                 |a22-algorab|NO-Alphasense_processed |
dc-environment-a22            |0.3.0                 |a22-algorab|O3_processed            |
dc-environment-a22            |0.3.0                 |a22-algorab|PM10_processed          |
dc-environment-a22            |0.3.0                 |a22-algorab|PM2.5_processed         |

Computed by:

select p.data_collector, p.data_collector_version, p.lineage, cname 
from measurement m  
join station s on m.station_id = s.id
join type t on t.id = m.type_id 
join provenance p on p.id = m.provenance_id 
where origin = 'a22-algorab' 
and cname ~* 'processed'
or cname ~* 'raw'
and period = 600
group by 1, 2, 3, 4
rcavaliere commented 2 years ago

@Piiit thanks for the check. But could it be that the right elaboration tools are the ones with origin = NOI?

Piiit commented 2 years ago

@rcavaliere I have no clue, I was hoping you can tell me... We need to check what is what and write some documentation about the whole data flow. Where should we put that?

rcavaliere commented 2 years ago

Yes, please do this. Put in the README of each component what it exactly does in terms of functionalities

rcavaliere commented 2 years ago

@Piiit I have checked a little bit. I understood what made me confusing, this is related to the old elaboration chain, which worked as follows:

raw measurements (about one measurement every minute) -> average on 10-minutes window -> non-linear calibration function (on 10-minutes averages)-> average on 60-minutes window

I confirm that the correct elaboration workflow to be implemented is the following:

raw measurements (about one measurement every minute) -> average on 60-minutes window -> non-linear calibration function (on 60-minutes averages) ->

So we should ensure that the raw measurements provided by the Data Collector have period set different from 600, I would put period = 60 in this case. This also explains also the control on the number of records in the calculation on the 60-minutes averages: in one hour we expect about 60 records, and we defined 16 as the minimum amount of records for the calculation of the average. So I would keep this control.

Piiit commented 2 years ago

@rcavaliere I changed the period now to 60, should we also change the old period=600 records to period=60 inside the production DB?

rcavaliere commented 2 years ago

@Piiit yes, would be better. But of course just the relevant data types (with RAW in the description)

Piiit commented 2 years ago

@rcavaliere Done, everything in production... please double-check if everything is ok like this. Maybe we should monitor elaborations in the next days to understand if everything is working as expected...

The documentation not perfect. Eventually we could open separate issues for them, but I think you have a better understanding for these parts of the manuals... 1) https://github.com/noi-techpark/bdp-elaborations/tree/main/environment-a22-averages 2) https://github.com/noi-techpark/bdp-elaborations/tree/main/environment-a22-non-linear-calibration 3) https://github.com/noi-techpark/bdp-commons/tree/main/data-collectors/environment-a22

We also have AWS Lambda Github Actions now, but there we still need to do many things to have it properly integrated in our infrastructure:

rcavaliere commented 2 years ago

@Piiit super work! Now everything seems to work as expected. Currently we have few sensors working properly, but in short (aka this year) they will be upgraded.