openaq / openaq-fetch

A tool to collect data for OpenAQ platform.
MIT License
84 stars 39 forks source link

Obtaining historical data from CAAQM #593

Open QEDK opened 5 years ago

QEDK commented 5 years ago

Hey, I thought I'd drop a note: https://app.cpcbccr.com/caaqms/download?filename=site_(uniqueID/pattern).xlsx is useful for obtaining absolute station metrics (you can simply fetch all old info in a CSV and then use the data.gov.in real-time API to add on to that). The uniqueID can be obtained through an AJAX call in your browser dev console through a base64 encoded JSON request payload. I don't know how ethical is it, since it is public data anyway but using a one-time CSV is certainly less resource-intensive. It is useful for filling up the gaps in data as well per #585 .

RocketD0g commented 5 years ago

Thanks a bunch, @QEDK! Tagging @jflasher for his awareness too.

urbanemissions commented 5 years ago

@QEDK good morning.

Can you post an example, please.

urbanemissions commented 5 years ago

Just FYI.. data.gov.in real time API is only for air quality index

And what openaq and people like us are interested in is absolute air quality values.. which is posted on caaqm website.

If you can example for this https://app.cpcbccr.com/caaqms/download?filename=site_(uniqueID/pattern).xlsx that will be useful.

Main question - what is "pattern"

urbanemissions commented 5 years ago

Tried a few combinations..

https://app.cpcbccr.com/caaqms/download?filename=site_(1425/pattern).xlsx https://app.cpcbccr.com/caaqms/download?filename=site_1425.xlsx

QEDK commented 5 years ago

@urbanemissions Okay, I'll go step-by-step. The best place to get tabular data is https://app.cpcbccr.com/ccr/#/caaqm-dashboard-all/caaqm-landing/data Select your parameters and submit, the first thing you'll notice is that you are only getting 24 hours worth of data, this is not useful considering we already can get it from continuously accessing the API and elsewhere. This is because of an inbuilt limitation built into the form, making the client POST request only the last 24 hours. This is where you need a modified AJAX call. When you download a file on the tabular page you get, the client makes a POST to fetch the payload containing the file URL. This simply means you have to make an AJAX call with their parameters. Edit and resend the POST with the parameters of your liking and it should POST successfully (large csvs are slow, might cause 405 errors). The request payload is Base64 encoded so you will need additional work to decode and re-encode it. Here are some images which tell you what to do: https://ibb.co/7zSnrNP https://ibb.co/6Xyhxdh The unique ID is always in the format of site_10620190213203417 where the length of the number is same. There's probably a pattern (106 is probably station ID, 2019 is last year fetched maybe) but I don't know exactly how it works. You can access the same file here - https://app.cpcbccr.com/caaqms/download?filename=site_10620190213203417.xlsx to see that it indeed is working.

urbanemissions commented 5 years ago

This looks like a one time download link to a file made at the time of the request. site_IDYYYYMMDDHHMMSS.xlsx -- likely they are saving it for some time, which you are able to access - your request was made on 2019-02-13-20-34-17

If you change the ID number, it is a zero file.

QEDK commented 5 years ago

@urbanemissions That's probably it. Generating the download link is pretty plausible tho.

majesticio commented 1 year ago

updated download url. The site now uses a CAPTCHA so adding historical data would have to be done somewhat manually. If the API from #283 can be used for historical data that would be preferred.

urbanemissions commented 1 year ago

API from https://github.com/openaq/openaq-fetch/issues/283 is for data.gov.in -- which is AQI only. This is used for a couple of air quality apps in India.

majesticio commented 1 year ago

@urbanemissions do you know of an API for raw air quality data, rather than for AQI?

urbanemissions commented 1 year ago

If CPCB database access stopped because of a technical snag, it maybe worthwhile talking to one of these groups

https://ncaptracker.in/ https://www.airveda.com/ https://blueskyhq.io/products/bam-aq

There are groups outside India also doing the same (like iqair, etc) And none of these share openly what they are scrapping.. I understand that blueskyhq has a commercial API.

-- Dr. Sarath Guttikunda

http://www.urbanemissions.info http://www.urbanemissions.info

On Tue, Jan 31, 2023 at 1:48 AM Gabriel Fosse @.***> wrote:

@urbanemissions https://github.com/urbanemissions do you know of an API for raw air quality data, rather than for AQI?

— Reply to this email directly, view it on GitHub https://github.com/openaq/openaq-fetch/issues/593#issuecomment-1409276074, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD6CT2KTCQUXDJJBHVQVNCDWVAO23ANCNFSM4GXGOX2Q . You are receiving this because you were mentioned.Message ID: @.***>