openaq / openaq-fetch

A tool to collect data for OpenAQ platform.
MIT License
85 stars 39 forks source link

Italy Data - ARPALAZIO #282

Closed FabioMD1972 closed 7 years ago

FabioMD1972 commented 7 years ago

http://www.arpalazio.net/main/aria/sci/annoincorso/chimici.php

This site is only in italian language, I can help to translalate it.

RocketD0g commented 7 years ago

Thanks, @FabioMD1972! 👍

jobonaf commented 7 years ago

the URL for the hourly data for the whole Latio region (not only Rome) can be built as follows: http://www.arpalazio.net/main/aria/sci/annoincorso/chimici/<province>/DatiOrari/<province>_<pollutant>_<year>.txt with provinces: CC,FR,LT,RI,RM,VT and pollutants: BENZENE,CO,NO2,NOX,NO,O3,PM10,PM2.5,SO2

the TXT file is a table with columns: julian day, hour and one column for each station

stations' metadata: https://github.com/jobonaf/calicantus/blob/master/data/sites-info/metadata.ARPA-Lazio.csv

RocketD0g commented 7 years ago

I'm having issues accessing the hourly data via a link like:

http://www.arpalazio.net/main/aria/sci/annoincorso/chimici/%3Cprovince%3E/DatiOrari/RM_SO2_2017.txt

Thanks for any thoughts on where I am going wrong, @jobonaf!

jobonaf commented 7 years ago

You should substitute <province>with RMnot only in the file name, but also in the path, like this: http://www.arpalazio.net/main/aria/sci/annoincorso/chimici/RM/DatiOrari/RM_SO2_2017.txt

RocketD0g commented 7 years ago

Ah yup, oops - Thanks! Now labelling ready for dev.

dolugen commented 7 years ago

Thank you, @FabioMD1972 and @jobonaf!

Also, the PDF linked from the page has detailed information on the file format. Link to Google Translated version.

dolugen commented 7 years ago

stations' metadata: https://github.com/jobonaf/calicantus/blob/master/data/sites-info/metadata.ARPA-Lazio.csv

@jobonaf This is very helpful. Can I ask how you compiled this data?

dolugen commented 7 years ago

@RocketD0g Some concerns about the data:

RocketD0g commented 7 years ago

Really good observations, @dolugen.

If a station for the easily discernible past has not reported any data (e..g -999) ever for a particular pollutant, my thought is that we should skip adding in that station. If it is a mix of some reported data and lots of -999s for a particular station+pollutant, then I think it is still a fine station to add in. Do you think that is a sensible plan, @dolugen?

@jobonaf another question - Is there a place online where we can cite that these data are said to be hourly - or perhaps daily in the case of @dolugen's observation? It definitely seems in his example that those are daily values being reported versus hourly, but we would want to state the time-averaging interval that the source agency says the data are in. (And if the time interval looks to be mismatched with what the source agency says, we'd want to interact with the source agency for clarification on the apparent discrepancy.)

dolugen commented 7 years ago

If a station for the easily discernible past has not reported any data (e..g -999) ever for a particular pollutant, my thought is that we should skip adding in that station. If it is a mix of some reported data and lots of -999s for a particular station+pollutant, then I think it is still a fine station to add in.

I think that's reasonable, I'll go with that.

jobonaf commented 7 years ago

@dolugen @RocketD0g I collected the metadata for a project involving some italian regional environmental agencies (https://sdati.arpae.it/calicantus-intro/). Many (but not all!) stations in Italy measure PM10 and PM2.5 on a daily basis. If you need more info about data collected by ArpaLazio, you could contact the regional center for air quality (e-mail here http://www.arpalazio.net/main/aria/sci/basedati/bollettini/2017/BA282017.pdf).

RocketD0g commented 7 years ago

Thanks a bunch, @jobonaf - just shot them an email.

dolugen commented 7 years ago

@jobonaf I've found that the metadata.ARPA-Lazio.csv is missing data for several stations. Is it possible to update it? Here are the station IDs that are missing:

86
87
101
102
103
104
105
106
107
108
110
111

IDs 86, 87 are from Rome, and IDs > 100 are from Civitavecchia, as described in the PDF.

jobonaf commented 7 years ago

Thank you @dolugen , I will update the file. Consider also this page: http://www.arpalazio.net/main/aria/doc/RQA/locRQA.php

jobonaf commented 7 years ago

metadata.ARPA-Lazio.csvupdated

dolugen commented 7 years ago

Consider also this page: http://www.arpalazio.net/main/aria/doc/RQA/locRQA.php metadata.ARPA-Lazio.csv updated

Thank you!

dolugen commented 7 years ago

@RocketD0g So, I'm basically done with the adapter. Do I wait for the email reply about the hourly values of PMs? If they confirm it's daily averages, I'll change the adapter to save just the first daily value for PMs.

jobonaf commented 7 years ago

@dolugen you could also save directly the daily averages from here: http://www.arpalazio.net/main/aria/sci/annoincorso/chimici/RM/MedieGiornaliere/RM_PM10_2017_gg.txt More generally http://www.arpalazio.net/main/aria/sci/annoincorso/chimici/<province>/MedieGiornaliere/<province>_<pollutant>_2017_gg.txt with provinces: CC,FR,LT,RI,RM,VT and pollutants: PM10,PM2.5

RocketD0g commented 7 years ago

@dolugen - Our track record for receiving back messages to questions like that tends to be somewhat poor, so perhaps go with the daily values @jobonaf points to for the PM data?

dolugen commented 7 years ago

@jobonaf ARPALAZIO data is now on OpenAQ! You've been a great help, thank you! And @FabioMD1972 too, thanks for suggesting the source!