ropensci-archive / bomrang

:warning: ARCHIVED :warning: Australian government Bureau of Meteorology (BOM) data client for R
Other
109 stars 26 forks source link

Station list request throws `403: Forbidden` #137

Closed jimjam-slam closed 1 year ago

jimjam-slam commented 3 years ago
> get_historical_weather("040842", type = "max")
Error in file(con, "r") : 
  cannot open the connection to 'http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt'
In addition: Warning message:
In file(con, "r") :
  cannot open URL 'http://www.bom.gov.au/climate/data/lists_by_element/alphaAUS_136.txt': HTTP status was '403 Forbidden'

I'm manually visiting the station list URL in the browser and it's fine. Wondering if there've been any backend changes (eg. the curl discussion I recently saw on runapp) that're causing problems for anyone else? Can anyone else reproduce this?

Session Info ```r > sessionInfo() R version 4.0.2 (2020-06-22) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 20.04 LTS Matrix products: default BLAS: /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3 LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/liblapack.so.3 locale: [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8 [4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8 [7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C [10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] here_0.1 bomrang_0.7.4 magrittr_1.5 lubridate_1.7.9 [5] forcats_0.5.0 stringr_1.4.0 dplyr_1.0.2 purrr_0.3.4 [9] readr_1.3.1 tidyr_1.1.2 tibble_3.0.3 ggplot2_3.3.2 [13] tidyverse_1.3.0 loaded via a namespace (and not attached): [1] tidyselect_1.1.0 terra_1.1-4 haven_2.3.1 lattice_0.20-41 [5] colorspace_1.4-1 vctrs_0.3.4 generics_0.0.2 blob_1.2.1 [9] rlang_0.4.8 pillar_1.4.6 glue_1.4.2 withr_2.3.0 [13] DBI_1.1.0 rappdirs_0.3.1 sp_1.4-5 dbplyr_1.4.4 [17] modelr_0.1.8 readxl_1.3.1 lifecycle_0.2.0 munsell_0.5.0 [21] gtable_0.3.0 cellranger_1.1.0 rvest_0.3.6 raster_3.4-5 [25] codetools_0.2-16 ps_1.4.0 hoardr_0.5.2 fansi_0.4.1 [29] broom_0.7.0 Rcpp_1.0.5 scales_1.1.1 backports_1.1.10 [33] jsonlite_1.7.1 fs_1.5.0 hms_0.5.3 digest_0.6.25 [37] stringi_1.5.3 rprojroot_1.3-2 grid_4.0.2 cli_2.1.0 [41] tools_4.0.2 crayon_1.3.4 pkgconfig_2.0.3 ellipsis_0.3.1 [45] data.table_1.14.0 xml2_1.3.2 reprex_0.3.0 assertthat_0.2.1 [49] httr_1.4.2 rstudioapi_0.11 R6_2.4.1 compiler_4.0.2 ```
jimjam-slam commented 3 years ago

Never mind! Just saw the duplicates (primarily #133)

jimjam-slam commented 3 years ago

Re-opening because this is still occuring for me with the latest release (see https://github.com/ropensci/bomrang/issues/136#issuecomment-817448729).

adamhsparks commented 3 years ago

Well, it was passing all CI tests yesterday. Wonder what's changed (again).

jimjam-slam commented 3 years ago

It might be something particular to my setup! I'll try to investigate this arvo.

adamhsparks commented 3 years ago

Not just you, I've confirmed it locally as well.

jimjam-slam commented 3 years ago

Our staff are also reporting problems from inside our codebase, which uses Docker images that were last built a month ago. That does suggest a change on BOM's end.

adamhsparks commented 3 years ago

Just to clarify, are you using bomrang in your codebase?

jimjam-slam commented 3 years ago

Yup (although we're looking to switch obs providers at some point later in the year, which'll likely mean that bomrang comes out).

On Mon, 12 Apr 2021, 19:07 Adam H. Sparks, @.***> wrote:

Just to clarify, are you using bomrang in your codebase?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/ropensci/bomrang/issues/137#issuecomment-817632499, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABRX6U5JFCH33SC7IGPB2N3TIKZ5VANCNFSM42YL3V4A .

adamhsparks commented 3 years ago

I just tested.

using the current version of bomrang's user agent string:

  options(HTTPUserAgent = paste0("{bomrang} R package (",
                                 utils::packageVersion("bomrang"),
                                 ") https://github.com/ropensci/bomrang"))

returns a "Error 403 forbidden"

using:

  options(HTTPUserAgent = ""))

Returns the requested data.

I'm not one for conspiracies, but since we explicitly said that the request was coming from bomrang in the user agent string and now it's blocked after we just implemented it because RStudio was blocked... 😕

Does anyone know anyone at BOM?

jonocarroll commented 3 years ago

It's another change, but if that's the case we can just spoof a regular browser HTTPUserAgent 🕵️‍♂️

adamhsparks commented 3 years ago

@rensa and I are discussing some options. That's certainly one. We also discussed caching the station lists as the other functions do as well.

jimjam-slam commented 3 years ago

I'll have an ask around and see if the station lists are on FTP too!

adamhsparks commented 3 years ago

The historical resource URLs are all HTTP requests as well, now that I look further.

adamhsparks commented 3 years ago

Since it's just HTTP requests, so far, that are being blocked in the package by BOM. I wonder, is there some statement that we've missed that this isn't allowable by BOM guidelines?

adamhsparks commented 3 years ago

OK, there is this:

You are entitled to use material on Bureau websites in accordance with the applicable terms above, noting that material such as Water Data is generally available under generous open access terms including a right to distribute and modify material. The use of any material on the Bureau websites obtained through use of automated or manual techniques including any form of scraping or hacking is prohibited. 'scraping' includes page, content, screen or web scraping amongst others, and is the process of extracting information from websites usually by converting unstructured website content (usually HTML) into structured data.

http://www.bom.gov.au/other/copyright.shtml

The 9 & 3 bulletins would fall into this category for sure. This one seems (to me) to be murkier, but perhaps this is what BOM is classifying it as and we should respect the TOS.

adurrer151 commented 3 years ago

It looks like the BOM has recently "made changes to the web site". Using python requests library to access the html observations, BOM throws the following error:

Potential automated request detected! We are making changes to our website therefore web scraping is no longer supported. Please contact us by filling in the details at http://reg.bom.gov.au/screenscraper/screenscraper_enquiry_form/ and we will get in touch with you.
adamhsparks commented 3 years ago

Ah, OK, well I guess that's good to know that bit of added information. We're not seeing that with the R requests, only that it's forbidden suddenly with no warning or explanation like this.

I looked at the form that this error message points to. I guess I can fill it out, but I'm not clear on what, if any, response would be given the statement of the copyright page on scraping for such a general request as bomrang where I'm not the end user per se.

buzacott commented 3 years ago

I'm also curious about these changes and I think it is a bad move.

If you go to the FAQ at http://www.bom.gov.au/waterdata/, they specifically mention/demonstrate an API to access water data which I have been using:

You can now use Web Services for Water Data Online to access water data.

A Sensor Observation Service standard version 2 (SOS2) has been implemented; the service returns data in WaterML2 format. The SOS2 standard was developed by the Open Geospatial Consortium (OGC) as part of the OGC Sensor Web Enablement framework. The WaterML2 standard was also developed by the OGC with support from the Bureau of Meteorology and CSIRO though the WIRADA research alliance.

Detailed information on how to use the service is available in the Guide to Sensor Observation Services (SOS2) for Water Data Online v1.0.1.pdf. An overview is provided below:

Access Point (URL)

SOS2 web services for Water Data Online can be accessed at www.bom.gov.au/waterdata/services?service=SOS&version=2.0&request=GetCapabilities

No one in their right mind would use this except via scripting. Getting around using the user agent is trivial but not a long term solution. I might fill in the form too and ask them what is going on

maelle commented 1 year ago

From the README

This package has been archived due to BOM's ongoing unwillingness to allow programmatic access to their data and actively blocking any attempts made using this package or other similar efforts.