As BrennerLEC technical expert I would like that the Open Data Hub can manage "physical" and "virtual" station locations associated to the low-cost air quality sensors used

rcavaliere commented 2 months ago

We need to handle a particular case for the dataset air quality provided by A22 through their AUGE platform, supplied by algorab.

The low-cost sensors in use will be periodically disinstalled from their physical locations on the highway, brought to an intercalibration site in Trento, and then installed again on a physical location on the highway that could be different from the previous one.

What we need is to have a modeling of the physical locations on the highway, including the intercalibration site, and an association with the sensor (identified by the identifier AQxx) that is currently installed there. This should ideally be organized through this mapping table: https://github.com/noi-techpark/bdp-commons/blob/main/data-collectors/environment-a22/src/main/resources/mappings/stationMappings.csv

What we have to ensure is:

physical stations are always visible on the Open Data Hub
physical stations always provide the detail of the associated sensor which is currently installed
physical stations always provide the detail of the history about which sensor was installed when on that physical location (this should be possible with the metadata history)

clezag commented 5 days ago

@rcavaliere my proposal would be this kind of format for the csv:

station_id	station_name	latitude	longitude	sensor_id	sensor_start
Stazione_KM140-605	Stazione_KM140-605	46.04227080945	11.11604421025	AIRQ10	01.01.2024
Stazione_KM140-605	Stazione_KM140-605	46.04227080945	11.11604421025	AIRQ15	01.05.2024
calibration_1	calibration_1	46.104338	11.110227	AIRQ10	01.05.2024
calibration_2	calibration_2	46.104338	11.110227	AIRQ15	05.03.2023

Since with the new architecture we will be able to replay history, we need to also track the history of which sensor was where, so that reimporting data does not associate to the wrong sensor.

When a sensor gets moved, just add a row to the csv with the new sensor and starting date. You then also have to "remove" the sensor at it's previous location, by associating another or empty sensor to the old location.

In this example, we start with AIRQ 15 in calibration, and AIRQ10 set up at at KM 140. AIRQ10 is then moved to calibration, and AIRQ15 is moved to KM 140 in it's place. The intercalibration station where AIRQ15 was located is set to inactive because it has no sensor anymore.

An incidental advantage of this logic will be that we can add changes in advance, they don't have to be synchronized with the actual moving of the sensor.

I will also implement a small verification script, so that when we update this file, our CI/CD will first check it's validity to avoid overlapping dates or multiple sensors associated to the same physical station

What do you think?

clezag commented 5 days ago

@rcavaliere Will we deprecate the existing dataset in favor of completely new stations/names here?

rcavaliere commented 5 days ago

@clezag if I got it well, the stations with a station_id will be always be flagged as active and available = TRUE as soon as they are in this CSV file; the information about the associated sensor and sensor_start in the metadata. Right? If yes, then absolutely OK for me. Once your are ready for the switch, then we deprecate the old file. What about historical data? I would suggest to also put in the new CSV file the information of the "old movements", if possible.

clezag commented 5 days ago

@rcavaliere

if I got it well, the stations with a station_id will be always be flagged as active and available = TRUE as soon as they are in this CSV file

I think more correct would be setting the stations that don't have any sensors attached to active=false, but that is something we can easily change. But in general you are right, the opendatahub stations will be based on the CSV list, and we then just attach the data points we receive to the station according to sensor mapping.

the information about the associated sensor and sensor_start in the metadata. Right?

I would set the currently attached sensor_ID as a single metadata field so that people can easily filter for it. But since we have it in the CSV already, we could in addition add the whole sensor history as a separate field.

What about historical data? I would suggest to also put in the new CSV file the information of the "old movements", if possible

One issue will be that the current station codes are in fact the codes of the sensors (AIRQ10 etc.), so that probably has to change if we disassociate sensors from physical stations. Do we make a new set of stations and migrate the data over? We can also maintain the old codes, but it could be confusing to users when the AIRQ10 station has the AIRQ15 sensor attached and the AIRQ10 sensor is somewhere else. If we migrate, then I agree on also recording the old movements

rcavaliere commented 5 days ago

@clezag OK, let me make some further thoughts during the week-end about your proposal...

rcavaliere commented 1 day ago

@clezag additional feedback from my side. The proposal is in general absolute OK for me, so let's go in this direction. For the historical data: what is relevant is at present the reference to the field stationcode, which the information of the exact sensor. What I could provide is the information about which sensor was installed where during time. We can then convert this information in the new CSV, as you proposed. We have then to assign the historical data to the new stations using the stationcode as key, I think this should work (this would be probably a manual task, which we will make once). What do you think?

noi-techpark / bdp-core

As BrennerLEC technical expert I would like that the Open Data Hub can manage "physical" and "virtual" station locations associated to the low-cost air quality sensors used #287