skgrange / saqgetr

Import Air Quality Monitoring Data in a Fast and Easy Way
GNU General Public License v3.0
9 stars 3 forks source link

Update of validated (E1a) data uncovered a lack of data delivery for the UK #10

Open skgrange opened 1 year ago

skgrange commented 1 year ago

To the users of saqgetr, the observations since 2019 have been updated with validated data (called the E1a data flow) in the AQER nomenclature this month. However, there are some issues with the lack of validated data for some countries, notably the UK. I have reinserted the near-real-time observations (from the E2a data flow) for the UK which I think has resolved the missing data issue, but in-depth testing of other countries has not been done. If users encounter systematic missing data for a year across a number of monitoring sites in a country, please let me know and I will see what I can do. Many thanks!

SverreSolberg commented 10 months ago

I have been looking into the EEA data by use of the very convenient sagetr package. My focus has been on o3, no2 and pm2.5 comparing with the data we have in EBAS (at NILU). For the UK o3 data it seems all data from 2020 and on is classified as 'aqer:e2a', meaning non-validated data (possibly linked to Brexit?) Does it mean that UK has only delivered the first unvalidated data to EEA or that EEA has not done any QA on the data(?) I didnt think EEA had any routines for QA anyway, so I guess the first option is the right one. I mainly find fairly good agreements with the data in EBAS (although not identical), with some exceptions, depending on the site. gb0048r is a site that differs substantially for o3. Then, I also see that when looking at annual data sets, most hours for the last day (31-12) gives missing values. I know that data for the very last hour(s) of the year could differ due to different use of time zones, but this is not the case as it regards all hours except the first and last hour at 31-12. This seems to be the case for all gb sites. Some differences in the data in EEA and EBAS could be explained by additional QA procedures at NILU, but this cant explain all the differences and not the missing data for the last day. Below is what I find of o3 at gb0039r for the end of 2021 and 2022. Happy for any views on this :-)

17502 | 2021-12-30 23:00:00 | 2021-12-31 00:00:00 | gb0039r | o3 | 311747 | 1 | 1 | ug.m-3 | 48.09637 -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 17503 | 2021-12-31 00:00:00 | 2021-12-31 01:00:00 | gb0039r | o3 | 311747 | 1 | 1 | ug.m-3 | 46.89895 17504 | 2021-12-31 01:00:00 | 2021-12-31 02:00:00 | gb0039r | o3 | 311747 | 1 | 1 | ug.m-3 | 48.29594 17505 | 2021-12-31 23:00:00 | 2022-01-01 00:00:00 | gb0039r | o3 | 311747 | 1 | 1 | ug.m-3 | 43.90540 17506 | 2022-01-01 00:00:00 | 2022-01-01 01:00:00 | gb0039r | o3 | 311747 | 1 | 1 | ug.m-3 | 44.90325 17507 | 2022-01-01 01:00:00 | 2022-01-01 02:00:00 | gb0039r | o3 | 311747 | 1 | 1 | ug.m-3 | 47.89680 17458 | 2022-12-30 18:00:00 | 2022-12-30 19:00:00 | gb0039r | o3 | 311747 | 1 | 1 | ug.m-3 | 58.07487 -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 17459 | 2022-12-30 19:00:00 | 2022-12-30 20:00:00 | gb0039r | o3 | 311747 | 1 | 1 | ug.m-3 | 53.68433 17460 | 2022-12-31 23:00:00 | 2023-01-01 00:00:00 | gb0039r | o3 | 311747 | 1 | 1 | ug.m-3 | 72.84305 17461 | 2023-01-01 00:00:00 | 2023-01-01 01:00:00 | gb0039r | o3 | 311747 | 1 | 1 | ug.m-3 | 72.84305 17462 | 2023-01-01 01:00:00 | 2023-01-01 02:00:00 | gb0039r | o3 | 311747 | 1 | 1 | ug.m-3 | 68.05337
skgrange commented 10 months ago

Hello Sverre, It looks like this can be broken into two pieces. The first is issues with the United Kingdom's data submissions and the second is questions about the quality of the invalidated E2a observations.

You are correct regarding the first point, the UK is no longer submitting validated E1a data to the European Commission. This is likely due to the country's relationship with the European Union changing and the country may not be obliged to submit such data anymore. It is interesting that the near-real-time observations (E2a) are still being delivered, however.

The E2a observations do not require updating after the first transmission, and therefore, these data have not gone through any validation processes and will diverge from observations that have undergone such processes. This probably explains the differences you have seen between the two data sources.

I have had a look at the issue where observations for the last day of the year are missing for the British data. This looks like an oversight by the data submitter where the observations between, say 2021-01-01 and 2021-12-31 are queried from their database and exported. Almost certainly, the data submitter has failed to ceiling round the last day of the year to the final instant of the year (2021-12-31 23:59:59), and therefore observations for the final day of the year are missing. This may not be the exact reason, but something like this is happening.

I am also thinking that I will discontinue this service. I am getting several messages a month with users questioning/inquiring/complaining about the data. My objective has always been to make European air quality observations accessible but I no longer use this database myself much and my capacity is limited. I will have a chat with some colleagues about a possible migration, but it might be worth evaluating the workload that will be required to query the data portals directly and do your own cleaning. I will not switch things off without appropriate notification, but it might be something to plan for. Have a great Friday, Stuart.