openaq / openaq-fetch

A tool to collect data for OpenAQ platform.
MIT License
83 stars 39 forks source link

China - Sources (not useable yet) #39

Closed RocketD0g closed 6 years ago

RocketD0g commented 8 years ago

There are a ton of China air quality data sources. Here are some: aepb.gov.cn bjmemc.com.cn cdemc.cn cfhb.gov.cn cepb.gov.cn dl.gov.cn dyhb.gov.cn nbemc.gov.cn sdein.gov.cn fjepb.gov.cn gsep.gansu.gov.cn qhepb.gov.cn gdep.gov.cn gxepb.gov.cn ghb.gov.cn dloer.gov.cn hebei.gov.cn hljdep.gov.cn hnep.gov.cn hbepb.gov.cn hbt.hunan.gov.cn nmgepb.gov.cn jshb.gov.cn jxepb.gov.cn shbj.klmy.gov.cn lnemc.cn lzhb.gov.cn nnems.gov.cn nxep.gov.cn ordoshb.gov.cn semc.gov.cn sxhjjcz.com.cn szhec.gov.cn tjemc.org.cn xzep.gov.cn wlmqhb.gov.cn whepb.gov.cn xianemc.gov.cn xnepb.gov.cn xjepb.gov.cn ynepb.gov.cn zjepb.gov.cn Sources from: aqicn.org/sources

There is also a site that provides an API to Chinese AQ data: http://pm25.in/

However the issue is that from what I can tell, these data are not shared in their raw format only AQI (I have not clicked on each and everyone however).

Anyone know differently or find ones above that are shared our in raw format?

eddowh commented 7 years ago

We use http://pm25.in/ for our company scraping tool. They provide raw measurements in concentration levels (pm2.5, pm10, co2, so2, o3 in past hour, o3 in past eight hours, etc.). Right now we're scraping the site since it's relatively easy to do so but we're planning to grab an API token soon, which requires sending an e-mail to the support team in Mandarin Chinese.

jflasher commented 7 years ago

Hey @eddowh thanks for the note. We contacted pm25.in about API access, but never got a response. If you have better luck, let us know. It'd be great to get access to this data.

Also, sounds like you're scraping the data for your own use, if the data was in OpenAQ already, would you just be able to use our platform instead?

xgao32 commented 7 years ago

Howdy, @RocketD0g @jflasher any new updates regarding adding more data sources for China? I can't help but notice other folks have been able to access pm25.in's api here and here. The API's doc is also available, but only in Chinese as of now. Let me know if y'all need help accessing the raw data. Thinkpage also provides such data and claims to have a English api, though the site is also in Chinese.

FYI, the Tianjin site works and even has a map showing detailed stations and latest measurements.

jflasher commented 7 years ago

Hi @gaomrx thanks for the question! We've looked at pm25.in before, but were unable to receive an API token. I think @nickolasclarke has offered to help potentially do some of this work.

xgao32 commented 7 years ago

@jflasher There is a public App Key for pm25.in (5j1znBVAsnSf5xQyNQyq) and the site says it has the same limitations as a private key. I haven't tried it yet, but there is also Think Page which provides access to air quality data from over 300 cities in China. The data, regardless of API, comes from some official government website but it is not apparent where they are hidden.

nickolasclarke commented 7 years ago

PM2.5 is an option as noted. Other options for getting Chinese data include http://caiyunapp.com/, which is a paid service. You can find documentation about their API here and here. Unfortunately, they don't let you get direct data from the monitors, but rather only from GPS coordinates. If you don't have a token and try to scrape the site, supposedly they only return a city average, not specific data.

We also leverage scrappers against lots of municipal EPA websites, and similar things can be done to central govt sites like http://www.cnemc.cn/ and provincial sites.

RocketD0g commented 7 years ago

@gaomrx - Thanks for all of this! And mentioning the update on the public key for PM25.in (we haven't tried it yet either). In addition to basic access, there are also issues accessing data from pm25.in in the form that we would need it for our data format (e.g. coordinates at the station level) and in a way that let's us (and more importantly, our users) see directly how the data are scraped or otherwise accessed from the original site. Let me know if I'm wrong on any of that.

I had not heard of the Think Page API - thanks for sharing that. From their documentation, it looks like they provide physical AQ data at a station level for Chinese stations, which is awesome! One issue is that we can't see transparently how they grab the data from the originating government site, which is a bit of an issue for us. It's also not a free service for real-time air quality data. My guess is that it will be against their terms of service for us to pay for grabbing their data and then making it available for free. Additionally, it isn't really in our cards right now to pay for data as an organization. If I'm wrong that real-time AQ data must be paid for to access, let me know. I'm using google translate to assess the site.

This leaves us back to: Do we attempt the arduous process of scraping the data from the sites ourselves? Can we work with a group like pm25.in to get the missing information we need? As I think @jflasher mentioned, we have had trouble initiating communication with pm25.in, but this likely due to our inability to speak Mandarin. :) My personal sense is that in the end, we'll need to build the scrapers ourselves. Comments, anyone (including, @jflasher)?

nickolasclarke commented 7 years ago

I also recently found this API. I've not dug into it much yet: https://www.juhe.cn/docs/api/id/33 but it seems to report concentration as well as AQI. It is paid however.

nickolasclarke commented 7 years ago

http://apistore.baidu.com/astore/servicesearch?word=%E7%A9%BA%E6%B0%94&searchType=null here is a list of possible air quality API's from Baidu's api store. All paid, though some are very cheap.

jflasher commented 7 years ago

My guess is that anything paid would not be a viable option. I can't imagine a paid service would be open to us making all the data freely available.

xgao32 commented 7 years ago

@RocketD0g After reading the terms and condition for Think Page, you are right that they prohibit freely sharing the data. The pm2.5 api doesn't provide lat/long but the stations are coded and I believe it is possible to look them up.

I am very confused as to the mission of OpenAQ now. If providing a free API to access air quality data is the goal, isn't that duplicating what aqicn.org is doing?

jflasher commented 7 years ago

Hi @gaomrx, I can't speak to what aqicn's goals are, but our goals are to provide open/transparent access to historic data programmatically and to build up a community to help build tools/awareness around that and make change locally. This means every data point we capture is available as a physical value (not an AQI which is less useful in our opinion). This also means we have this repo where everyone can see what code is used to ingest/serve the data and can contribute their own data. And it also means we have things like our Slack channel and workshops where discussions can happen about how best to use the data and enact change around the world.

nickolasclarke commented 7 years ago

https://m.zq12369.com/cityaqi.php?city=%E5%8C%97%E4%BA%AC

this also seems to be an easy source that could be scraped or even use their API. I was able to pull data easily by extracting the POST requests and using curl to pull down data in JSON.

I also spoke to @jflasher about reaching out to pm25.in for an API key in chinese to see if we have any better luck.

ellieLitwack commented 6 years ago

Here's a link to my translation of the http://pm25.in API docs: https://gist.github.com/eliLitwack/5ea7f26d23a991fdffe5bf5c5cb318b7

nickolasclarke commented 6 years ago

pasting this here for archiving purposes. This is a list of sources our team compiled awhile back. Most, if not all of this should already be sourced in #453 but just in case direct drivers ever need to be written! There is some overlap with the links posted above.

已经支持

无数据

等待整理

http://222.177.117.35:8021/HistoryDay/SitesDataStationMap.aspx

http://121.28.49.85:8080/datas/hour/130000.xml?radn=random()

view-source:http://111.40.0.99:8081/

jflasher commented 6 years ago

Closing this in favor of #453. Feel free to reopen if folks feel it's necessary.