ropensci-archive / bomrang

:warning: ARCHIVED :warning: Australian government Bureau of Meteorology (BOM) data client for R
Other
109 stars 26 forks source link

Monthly products and/or ACORN-SAT for get_historical()? #99

Closed jimjam-slam closed 1 year ago

jimjam-slam commented 5 years ago

I'm writing an app at the moment that involves pulling a combination of:

  1. raw daily temp and precip station data (products IDCJAC00[10, 11]),
  2. raw daily precip data (IDCJAC0009)
  3. raw monthly temp data (IDCJAC000[2, 4, 5–8]) and
  4. ACORN-SAT temperature data

get_historical() covers cases (1) and (2) but not (3) or (4). My understanding is that implementing case (3) would be similar to (1) and (2) but would use a different ancillary file to determine the c parameter in the URL (not just substituting the NCC code for each product).

Case (4), on the other hand, would be fairly cruisy: the URL is just http://www.bom.gov.au/climate/change/hqsites/data/temp/[tmax/tmin].[station_id].daily.csv. But it might be more appropriate to offer it as an entirely separate function to get_historical().

I'm not sure whether you're interested in implementing either of these cases in bomrang, but since I need to work on this anyway I thought I'd mention it! Happy to plug away at it myself and either do a PR or just document it here for future consideration 😄 If you can offer any insight into how the ancillary file helps fill the additional c parameter in, I'd also appreciate it a lot!

deanmarchiori commented 5 years ago

Hey James,

Good to hear from you. Nice suggestions.

I can see case (3) being a nice addition to get_historical() say with an extra parameter period = c("daily", "monthly") or similar.

Agree case (4) sounds like its own thing.

I'm not 100% sure, but I think the determination for that URL parameter is covered in the source at: https://github.com/ropensci/bomrang/blob/5c32b89e9939a550c25aca4dda9b89a41e1c3f12/R/get_historical.R#L293

' BOM data is available via URL endpoints but the arguments are not (well)

' documented. This function first obtains an auxilliary data file for the given

' station/measurement type which contains the remaining value p_c. It then

' constructs the approriate resource URL.

adamhsparks commented 5 years ago

I'm with @deanmarchiori on both items, @jonocarroll, what say ye about the modification to get_historical()?

It all sounds like a nice addition. We're happy to have PRs here. That's basically how this package came into being.

jonocarroll commented 5 years ago

Sounds good to me. I don't quite follow why the monthly data needs to be scraped vs summarised from the daily data, but if there's a URL for it then I'm happy to have it processed consistently. Is the monthly data available for all the codes (rain, solar, etc...)?

The p_c sleuthing is undoing whatever intentional/accidental obfuscation the BOM hides their URLs behind, but it requires the secondary file.

jimjam-slam commented 5 years ago

Thanks, everyone!

Yeah, I'm pretty comfortable manually aggregating daily data to monthly myself—in fact, where we use ACORN-SAT here we have to do that anyway—but since the products are available and our team prefers to use existing BOM products where possible, that's what I'm doing 😅

One obstacle could be that .get_nnc() currently retrieves the list of stations available for each known NCC code at: http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_[NCC].txt.

https://github.com/ropensci/bomrang/blob/5c32b89e9939a550c25aca4dda9b89a41e1c3f12/R/get_historical.R#L209

Those lists are accessible at http://www.bom.gov.au/climate/cdo/about/sitedata.shtml, but the monthly options don't appear to be there on that page or at the expected URLs.

Further, the ancillary file containing the value of p_c for the existing products (retrieved in .get_zip_url()) doesn't appear to work for the monthly ones. For example, for Olympic park (086338), here's daily tmin (NCC: 123):

http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_stn_num=086338&p_display_type=availableYears&p_nccObsCode=123

086338||,2013:-1490879938,2014:-1490879938,2015:-1490879938,2016:-1490879938,2017:-1490879938,2018:-1490879938,2019:-1490879938

But here's highest monthly tmin (NCC 42):

http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_stn_num=086338&p_display_type=availableYears&p_nccObsCode=42

086338||

The URLs for the monthly data clearly operate using the same parameter, though, so it might just be a matter of figuring out how the ancillary file URLs need to differ:

http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?
 p_display_type=monthlyZippedDataFile&
  p_stn_num=086338&
  p_c=-1490870144&
  p_nccObsCode=40&
  p_startYear=
jimjam-slam commented 5 years ago

I should add that I haven't tried to use monthly rainfall or solar, but it looks like the NCC codes are:

(Can't find any mention of monthly extremes for these two, though.)

jimjam-slam commented 5 years ago

Unfortunately, the JavaScript used on the monthly HTML pages also appears to scrape p_c from the HTML. And it isn't the p_c for the monthly product, it's the p_c for the corresponding daily one.

The only thing I can think of from here is if av? can take a value for p_display_type other than availableYears that can provide p_c 😕

EDIT: polling a few stations (040764, 023090, 086338), it seems like different products have values of p_c that vary by a fixed, or nearly fixed, amount:

This is probably a rabbit hole for me to go down, but it seems doubtful that p_c is a hash of some kind.

(Please let me know if my thinking out loud isn't welcome in this thread!)

jonocarroll commented 5 years ago

I'll also have a play if I get the chance. Could you please link a few daily and monthly data pages for me to test?

softloud commented 5 years ago

Product here is not the same as product here. That has been my contribution to this so far :joy_cat: :croissant:

ghost commented 5 years ago

Hi - sorry lurking on this for a while. 

Last year I did some scraping on the ACORN-SAT data while I was learning Shiny and arguing with a climate change denier. I've got a notebook here for it 

http://rpubs.com/benmoretti/434904

and a Shiny dashboard for the aggregated data

https://benmoretti.shinyapps.io/ACORN_SAT_stations_data/

There might be some useful data parsing code in there 

Cheers

Ben

On 24 May 2019 at 2:36 pm, Jonathan Carroll notifications@github.com wrote:

I'll also have a play if I get the chance. Could you please link a few daily and monthly data pages for me to test?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

jimjam-slam commented 5 years ago

Thanks @jonocarroll! Here're some pages:

Station Product Product NCC code Page link Download link
040764 Daily maximum temperature 122 http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=122&p_display_type=dailyDataFile&p_startYear=&p_c=&p_stn_num=040764 http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile&p_stn_num=040764&p_c=-332371462&p_nccObsCode=122&p_startYear=2019
040764 Daily minimum temperature 123 http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=123&p_display_type=dailyDataFile&p_startYear=&p_c=&p_stn_num=040764 http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile&p_stn_num=040764&p_c=-332371658&p_nccObsCode=123&p_startYear=2019
040764 Monthly mean maximum temperature 36 http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=36&p_display_type=dataFile&p_startYear=&p_c=&p_stn_num=040764 http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_display_type=monthlyZippedDataFile&p_stn_num=040764&p_c=-332360591&p_nccObsCode=36&p_startYear=
040764 Monthly lowest temperature (lowest tmin) 43 http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_nccObsCode=43&p_display_type=dataFile&p_startYear=&p_c=&p_stn_num=040764 http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_display_type=monthlyZippedDataFile&p_stn_num=040764&p_c=-332361034&p_nccObsCode=43&p_startYear=
jimjam-slam commented 5 years ago

Thanks very much, @benmoretti! As a repeat ACORN-SAT user, I'm very grateful that those URLs are a lot less ambigious 😁

maelle commented 1 year ago

From the README

This package has been archived due to BOM's ongoing unwillingness to allow programmatic access to their data and actively blocking any attempts made using this package or other similar efforts.