ropensci-archive / bomrang

:warning: ARCHIVED :warning: Australian government Bureau of Meteorology (BOM) data client for R
Other
109 stars 26 forks source link

get_historical_weather does not respond #130

Closed adamhsparks closed 3 years ago

adamhsparks commented 3 years ago

For a few weeks or up to a month or two now the CI tests have been failing on every OS tested.

So far I've updated the internal databases to reflect the latest metadata on stations and locations BOM has.

Here is an example URL not responding that's generated by the second example.

get_historical_weather(latlon = c(-35.2809, 149.1300), type = "min") ## 3,500+ daily records

http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av?p_display_type=dailyZippedDataFile&p_stn_num=070351&p_c=-989854804&p_nccObsCode=123

When attempting to fetch the zip file the URL does not respond in R or in the browser, but I am able to browse to the data and display a table in my browser window. But using this method provided by the BOM website, http://www.bom.gov.au/climate/data/stations/, and requesting the same station and trying to download any data fails. Presumably, there's an issue with the BOM server.

adamhsparks commented 3 years ago

To further add to the confusion, the first example does appear to fetch the data.

get_historical_weather(stationid = "023000", type = "max")
adamhsparks commented 3 years ago

Checking the first URL for a response indicates that it's OK. So we can't just check first and then stop.

> curl -Is http://www.bom.gov.au/jsp/ncc/cdio/weatherData/av\?p_display_type\=dailyZippedDataFile\&p_stn_num\=070351\&p_c\=-989854804\&p_nccObsCode\=123 | head -1
> HTTP/1.1 200 OK
adamhsparks commented 3 years ago

Now the first example is failing too. BOM servers seem to have issues right now.

paulr-bv commented 3 years ago

Further access issues today. Call is:

get_historical(stationid = "023000", type = "min")

Returns:

Error in file(con, "r") : 
  cannot open the connection to 'http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt'

In addition: Warning message:
In file(con, "r") :
  cannot open URL 'http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt': HTTP status was '403 Forbidden'

Accessing via curl ok:

> curl -Is http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt
> HTTP/1.1 200 OK

Accessing from web browser loads the file.

jonocarroll commented 3 years ago

Sorry for the silence, I have been lurking.

I get no issues from

readLines("http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt")

and expected data returned from get_historical()

> bomrang::get_historical(stationid = "023000", type = "min")
Data saved as /var/folders/7s/d2nrjvg10x76v0hyg1jx31_40000gn/T//Rtmpdp6EUx/IDCJAC0011_023000_1800_Data.csv
  --- Australian Bureau of Meteorology (BOM) Data Resource ---
  (Original Request Parameters)
  Station:      ADELAIDE (WEST TERRACE / NGAYIRDAPIRA) [023000]
  Location:     lat: -34.9257, lon: 138.5832
  Measurement / Origin: Min / Historical
  Timespan:     1887-01-01 -- 2021-03-01 [96 years]
  ---------------------------------------------------------------
       product_code station_number year month day min_temperature
    1:   IDCJAC0011          23000 1887     1   1              NA
    2:   IDCJAC0011          23000 1887     1   2              NA
    3:   IDCJAC0011          23000 1887     1   3              NA
    4:   IDCJAC0011          23000 1887     1   4              NA
    5:   IDCJAC0011          23000 1887     1   5              NA
   ---
49015:   IDCJAC0011          23000 2021     3  13            17.9
49016:   IDCJAC0011          23000 2021     3  14             9.7
49017:   IDCJAC0011          23000 2021     3  15             9.4
49018:   IDCJAC0011          23000 2021     3  16            14.4
49019:   IDCJAC0011          23000 2021     3  17            13.7
       accum_days_min quality
    1:             NA
    2:             NA
    3:             NA
    4:             NA
    5:             NA
   ---
49015:              1       N
49016:              1       N
49017:              1       Y
49018:              1       N
49019:              1       N

so maybe this is an intermittent issue (fun).

The only thing I can think of is that while alphaAUS_136.txt returns 200 (OK), and alphaAUS_000.txt returns 404 (Not Found), the parent directory returns 403 (Forbidden)... could it perhaps be that the permissions are variable or dependent on the user agent or protocol or something?

paulr-bv commented 3 years ago

Hmm, it might be a client system thing.
I'm on a Mac and still running R 3.6.1 (sessionInfo at the bottom) as I haven't had the time to upgrade amidst some project work. Don't know if that is part of it (but I thought that my original script was working a few days ago).

I tried some readLines() calls with mixed results. I had Restarted R so no explicit libraries loaded.

The call to the BOM server failed again for me:

> readLines("http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt")
Error in file(con, "r") : 
  cannot open the connection to 'http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt'
In addition: Warning message:
In file(con, "r") :
  cannot open URL 'http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt': HTTP status was '403 Forbidden'

I then called w3.org (just another text file out there) and it worked:

> readLines("https://www.w3.org/TR/PNG/iso_8859-1.txt")
  [1] "The following are the graphical (non-control) characters defined by"            "ISO 8859-1 (1987).  Descriptions in words aren't all that helpful,"            
  [3] "but they're the best we can do in text.  A graphics file illustrating"          "the character set should be available from the same archive as this"           
  [5] "file."                                                                          ""                                                                              
  [7] "Hex Description                 Hex Description"                                ""                                                                              
  [9] "20  SPACE"                                                                      "21  EXCLAMATION MARK            A1  INVERTED EXCLAMATION MARK"                 
 [11] "22  QUOTATION MARK              A2  CENT SIGN"                                  "23  NUMBER SIGN                 A3  POUND SIGN" 

My sessionInfo():

> sessionInfo()
R version 3.6.1 (2019-07-05)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS Catalina 10.15.7

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib

locale:
[1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_3.6.1 tools_3.6.1   

Note that I had upgraded BOMRang to the latest version:

> packageVersion("BOMRang")
[1] ‘0.7.3’

I found a work around for my script, but like most of these things, it would be nice to work out what is going on at some stage!

mattecologist commented 3 years ago

I'm also having the same issues, R 4.0.2 in RStudio 1.4.1103:

> readLines("http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt") Error in file(con, "r") : cannot open the connection to 'http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt' In addition: Warning message: In file(con, "r") : cannot open URL 'http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt': HTTP status was '403 Forbidden'

But its working fine directly through the R console...

Both are without the bomrang package loaded, just checking access to BoM.

Rstudio Session info:

`sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-apple-darwin17.0 (64-bit) Running under: macOS Catalina 10.15.7

Matrix products: default BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale: [1] en_AU.UTF-8/en_AU.UTF-8/en_AU.UTF-8/C/en_AU.UTF-8/en_AU.UTF-8

attached base packages: [1] stats graphics grDevices utils datasets methods base

loaded via a namespace (and not attached): [1] compiler_4.0.2 assertthat_0.2.1 cli_2.3.1 tools_4.0.2 withr_2.4.1 glue_1.4.2 sessioninfo_1.1.1`

jonocarroll commented 3 years ago

I should have mentioned that I tested in a terminal. I can reproduce the error from RStudio 1.4.1623 on Mac. I don't see any prominent discussions about it, so maybe reach out to Twitter or RStudio directly?

jonocarroll commented 3 years ago

For reference, this can be traced slightly upstream to file(con, "r") failing.

jonocarroll commented 3 years ago

Okay, it looks like the RStudio HTTPUserAgent is being rejected... based on https://r.789695.n4.nabble.com/File-Downloading-Problem-td3022137.html I tried

op <- options()
getOption("HTTPUserAgent")
#> [1] "RStudio Desktop (1.4.1623); R (3.6.2 x86_64-apple-darwin15.6.0 x86_64 darwin15.6.0)"
options(HTTPUserAgent = "")
readLines("http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt")
options(op)

with success.

My suggestion would be to add the following to bomrang fetching functions:

op <- options()
on.exit(options(op))
options(HTTPUserAgent = "{bomrang} R package (0.7.4) https://github.com/ropensci/bomrang")
readLines(<file>)

Maybe someone from BOM will file an issue... maybe they're deliberately blocking RStudio as a way to limit what bomrang does.

mattecologist commented 3 years ago

@jonocarroll can confirm this is working for me too now. Thanks!

paulr-bv commented 3 years ago

Agree with @mattecologist, @jonocarroll, blanking the UserAgent works for me. Restoring the options then has it fail again.

In the interim, I just updated my script to put a wrapper around calls to get_historical() and the script worked.

The wrapper is:

cfg_bom_http_fix <- TRUE 

if (cfg_bom_http_fix) {
  #Backup the options, then set the options to blank the agent for BOM call
  op_bak <- options()
  options(HTTPUserAgent = "")
}

data_min_raw <- get_historical(station_number, type = "min")

if (cfg_bom_http_fix) {
  # Restore
  options(op_bak)
}

Thanks again!

adamhsparks commented 3 years ago

Thanks for teasing this out, @jonocarroll. We just moved into new offices and our network firewalls are causing some issues here so I wasn't going to be a good test case for this.

That said, it does work.

BOM's file serving seems to have been unstable lately. I was having issues in RStudio and the base console last week as noted that it didn't respond in R or the browser. Last week we were in the old offices, so I didn't have the same firewall issues I've had here since Monday.

jonocarroll commented 3 years ago

I'll try to make a PR to add this new user agent. It may end up being blocked specifically, in which case I'd say opening a line of communication to BOM may be prudent.

adamhsparks commented 3 years ago

I’ve got it covered. I’m working on the package right now anyway. Thanks for the offer, @jonocarroll

Sent from my iPad

On 19 Mar 2021, at 5:08 pm, Jonathan Carroll @.***> wrote:

 I'll try to make a PR to add this new user agent. It may end up being blocked specifically, in which case I'd say opening a line of communication to BOM may be prudent.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

jonocarroll commented 3 years ago

Oh, snap, neat timing. Check out my PR, it looks like this specific issue was the tip of the iceberg.

jonocarroll commented 3 years ago

I can confirm that on develop in a terminal, the failure point is the embedded nul which is fread() failing to read the .zip. This is also resolved in my PR.