openaq / openaq-fetch

A tool to collect data for OpenAQ platform.
MIT License
85 stars 39 forks source link

Spain and Germany concentrations = 0 #630

Closed ReaRuiRay closed 4 years ago

ReaRuiRay commented 4 years ago

All observations in Spain and Germany have concentrations of zero, occasionally. It seems that the fetching here is not successful and these should be treated as NaN.

Screen Shot 2019-10-30 at 1 07 48 PM
RocketD0g commented 4 years ago

Thanks, @ReaRuiRay! Do you see the underlying source (I believe EEA) reporting 'NaN' and it turning up as zeroes on our end?

Tagging @jflasher to make sure we aren't accidentally turning "NaN"s or unsuccessful fetches into 0's or something like that.

If this is not the case and Spain and Germany are actually reporting those zeros, our policy isn't to change or otherwise interpret a reporting stations values, even if we think it is highly likely they are reporting physically unrealistic values. The best we can do as aggregators and not the ones measuring the data directly is to a) suggest folks report to the originating source the data issue and b) use this open-source tool to flag data - or remove it in a cleaned file - that seems physically unlikely according to their specifications.

ReaRuiRay commented 4 years ago

Hey @RocketD0g , I checked EEA website and there are numbers > 0 reporting. Both OpenAQ online map and open-fetch on S3 bucket have zeros in Spain, Germany and Belgium.

I think that probably the API used in current scraper is outdated. I did a bit research and found that we can download data from each country individually as *.csv from http://discomap.eea.europa.eu/map/fme/AirQualityUTDExport.htm

These *CSVs contain data within the past 48 hours. I checked it and they look good. I haven't used Javascript much these days but if it would be fairly easy for me to write a scraper in Python. Any way I can help here?

jflasher commented 4 years ago

Hmm, this is odd. We do get the data from the CSVs and if nothing changed in the format, it’d be odd that JS is now parsing valid values as 0. We need to keep the adapters in JS so unfortunately Python wouldn’t help here. I’ll try to look at this tomorrow morning.

On October 30, 2019 at 19:30:50, Ruixiong (Ray) Zhang (notifications@github.com(mailto:notifications@github.com)) wrote:

Hey @RocketD0g(https://github.com/RocketD0g) , I checked EEA website and there are numbers > 0 reporting. Both OpenAQ online map and open-fetch on S3 bucket and confirm that there are zeros. It happens in Belgium as well.

I think that probably the API used in current scraper is outdated. I did a bit research and found that we can download data from each country individually as *.csv from http://discomap.eea.europa.eu/map/fme/AirQualityUTDExport.htm

These *CSVs contain data within the past 48 hours. I checked it and they look good. I haven't used Javascript much these days but if it would be fairly easy for me to write a scraper in Python. Any way I can help here?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub(https://github.com/openaq/openaq-fetch/issues/630?email_source=notifications&email_token=AAGPIJSLHKVTHQXXU7WO64LQRIKKVA5CNFSM4JG5PJZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECWC6KQ#issuecomment-548155178), or unsubscribe(https://github.com/notifications/unsubscribe-auth/AAGPIJXNI2UMZ34YWN4RVMDQRIKKVANCNFSM4JG5PJZQ).

jflasher commented 4 years ago

Was able to dig into this a bit and there is a problem with the data coming from EEA, though our adapter should catch it. When accessing http://discomap.eea.europa.eu/map/fme/latest/ES_PM2.5.csv as an example, there are records like

ES,NET_ES205A,CCAA Islas Canarias,ES.BDCA.AQD,http://dd.eionet.europa.eu/vocabulary/aq/timezone/UTC,PM2.5,SP_35019001_9_47,ES.BDCA.AQD,-15.541880430000003,27.772597359999992,EPSG:4979,ES1742A,STA_ES1742A,SAN AGUSTÍN,ES.BDCA.AQD,2019-10-31 09:00:00+01:00,2019-10-31 10:00:00+01:00,2019-10-31 10:16:38+01:00,+01:00,,-1,3,15,ug/m3

which have an empty string for the value_numeric field. That will evaluate to 0 in JS (yay!) and our adapter doesn't catch that properly, so we need to fix that to catch empty strings.

However, there are more 0's showing up in the system for EEA than what I can account for based on just looking over this one file as an example. I am wondering if there are more entries that start out with '' for the value, our system inserts them, then they get updated with a real value, but our system already has a measurement and doesn't overwrite? Regardless, catching the '' for the value will prevent the false 0's and then we can see what the behavior looks like.

I can't update the adapter now, but can likely do so in the next day or so unless someone gets to it first.

We should also make a ticket to remove the incorrect 0's from the archive.

ReaRuiRay commented 4 years ago

@jflasher Thanks for your prompt and detailed inputs! I checked it as well and found that there is also a value_validity flag next to value_numeric field. Whenever there is nothing in value_numeric, the flag flipped from 1 to -1. Probably we want to filter the entries with either no value_numeric or value_validity set to -1.

Screen Shot 2019-10-31 at 9 45 06 AM

I checked http://discomap.eea.europa.eu/map/fme/latest/DE_PM2.5.csv as well. The German data show few missing values, which is contradictory to what we see on OpenAQ's map. Probably this is a separate issue.

I would try to run the adapter alone to see if I can fix it or found what exactly happened. (This one right? https://github.com/openaq/openaq-fetch/blob/develop/adapters/eea-direct.js, I was looking at the old eea.js, that's why I thought API was used.)

jflasher commented 4 years ago

Yep! You can run it like node . -dvs “EEA Spain”

And good catch on that flag. I think filtering on that and catching empty string hopefully resolves this.

On October 31, 2019 at 09:53:20, Ruixiong (Ray) Zhang (notifications@github.com(mailto:notifications@github.com)) wrote:

@jflasher(https://github.com/jflasher) Thanks for your prompt and detailed inputs! I checked it as well and found that there is also a value_validity flag next to value_numeric field. Whenever there is nothing in value_numeric, the flag flipped from 1 to -1. Probably we want to filter the entries with either no value_numeric or value_validity set to -1.

I checked http://discomap.eea.europa.eu/map/fme/latest/DE_PM2.5.csv as well. The German data show few missing values, which is contradictory to what we see on OpenAQ's map. Probably this is a separate issue.

I would try to run the adapter alone to see if I can fix it or found what exactly happened. (This one right? https://github.com/openaq/openaq-fetch/blob/develop/adapters/eea-direct.js, I was looking at the old eea.js, that's why I thought API was used.)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub(https://github.com/openaq/openaq-fetch/issues/630?email_source=notifications&email_token=AAGPIJRZ4SAWRFGQPRW2VYDQRLPNBA5CNFSM4JG5PJZ2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECX3H2A#issuecomment-548385768), or unsubscribe(https://github.com/notifications/unsubscribe-auth/AAGPIJRK3FRJ3TM4DIY4UNLQRLPNBANCNFSM4JG5PJZQ).

ReaRuiRay commented 4 years ago

Cool, thanks @jflasher , I would give it a try this week.