hadley / ggplot2-book

ggplot2: elegant graphics for data analysis
https://ggplot2-book.org/
1.55k stars 678 forks source link

6.5 Raster maps: bomrang package has been archived on Jan 10, 2023 #338

Open AnttiRask opened 1 year ago

AnttiRask commented 1 year ago

As I was going through 6.5 Raster maps, I found out that the {bomrang} package was removed from CRAN on 2021-04-28.

In addition, the GitHub repository "has been archived by the owner on Jan 10, 2023. It is now read-only", accompanied by this explanation: "This package has been archived due to BOM's ongoing unwillingness to allow programmatic access to their data and actively blocking any attempts made using this package or other similar efforts."

In any case, it's too bad that the chapter now relies on a package that doesn't work anymore.

djnavarro commented 1 year ago

Thanks for opening this! I'm still thinking about the best way to fix it, but we'll definitely need to find an alternative. Wondering if @adamhsparks has any thoughts on this? It'd be nice to use satellite images that cover the same geographic region as the rest of the chapter.

adamhsparks commented 1 year ago

Hi @AnttiRask and @djnavarro, that's a tough one. This functionality does still work. But as BOM has been so steadfast in blocking other programmatic access I just archived the whole package rather than trying to remove the bits that didn't work since it would have greatly reduced the functionality of the package as a whole.

So, I've extracted the necessary bits to create a function to be used for this example in listing the imagery before using {curl} to download them if you'd like? That adds more code to the chapter but retains the same workflow.

get_available_imagery <- function(product_id = "all") {
  ftp_base <- "ftp://ftp.bom.gov.au/anon/gen/gms/"
  message("\nThe following files are currently available for download:\n")
  tif_list <- .ftp_images(product_id, bom_server = ftp_base)
  write(tif_list, file = file.path(tempdir(), "tif_list"))
  print(tif_list)

  .ftp_images <- function(product_id, bom_server) {
    list_files <- curl::new_handle()
    curl::handle_setopt(
      handle = list_files,
      CONNECTTIMEOUT = 60L,
      TIMEOUT = 120L,
      ftp_use_epsv = TRUE,
      dirlistonly = TRUE
    )

    # get file list from FTP server
    con <- curl::curl(url = "ftp://ftp.bom.gov.au/anon/gen/gms/",
                      "r",
                      handle = list_files)
    tif_files <- readLines(con)
    close(con)

    # filter only the GeoTIFF files
    tif_files <- tif_files[grepl("^.*\\.tif", tif_files)]

    # select the Product ID requested from list of files
    if (product_id != "all") {
      tif_files <- switch(
        product_id,
        "IDE00420" = {
          tif_files[grepl("IDE00420",
                          tif_files)]
        },
        "IDE00421" = {
          tif_files[grepl("IDE00421",
                          tif_files)]
        },
        "IDE00422" = {
          tif_files[grepl("IDE00422",
                          tif_files)]
        },
        "IDE00423" = {
          tif_files[grepl("IDE00423",
                          tif_files)]
        },
        "IDE00425" = {
          tif_files[grepl("IDE00425",
                          tif_files)]
        },
        "IDE00426" = {
          tif_files[grepl("IDE00426",
                          tif_files)]
        },
        "IDE00427" = {
          tif_files[grepl("IDE00427",
                          tif_files)]
        },
        "IDE00430" = {
          tif_files[grepl("IDE00430",
                          tif_files)]
        },
        "IDE00431" = {
          tif_files[grepl("IDE00431",
                          tif_files)]
        },
        "IDE00432" = {
          tif_files[grepl("IDE00432",
                          tif_files)]
        },
        "IDE00433" = {
          tif_files[grepl("IDE00433",
                          tif_files)]
        },
        "IDE00435" = {
          tif_files[grepl("IDE00435",
                          tif_files)]
        },
        "IDE00436" = {
          tif_files[grepl("IDE00436",
                          tif_files)]
        },
        "IDE00437" = {
          tif_files[grepl("IDE00437",
                          tif_files)]
        },
        tif_files[grepl("IDE00439",
                        tif_files)]
      )
      paste0(bom_server, tif_files)
    } else {
      tif_files
    }

    # check if the Product ID requested provides any files on the server
    if (length(tif_files) == 0 |
        tif_files[1] == "ftp://ftp.bom.gov.au/anon/gen/gms/") {
      stop(paste0("\nSorry, no files are currently available for ", product_id))
    }
    return(tif_files)
  }
}
AnttiRask commented 1 year ago

I don't know about the book, but @adamhsparks, I tried the code and got it to work. I had to move things around a bit, though, because the original was trying to use .ftp_images before it was created. Here's the corrected function:

get_available_imagery <- function(product_id = "all") {
  ftp_base <- "ftp://ftp.bom.gov.au/anon/gen/gms/"
  message("\nThe following files are currently available for download:\n")

  .ftp_images <- function(product_id, bom_server) {
    list_files <- curl::new_handle()
    curl::handle_setopt(
      handle = list_files,
      CONNECTTIMEOUT = 60L,
      TIMEOUT = 120L,
      ftp_use_epsv = TRUE,
      dirlistonly = TRUE
    )

    # get file list from FTP server
    con <- curl::curl(url = "ftp://ftp.bom.gov.au/anon/gen/gms/",
                      "r",
                      handle = list_files)
    tif_files <- readLines(con)
    close(con)

    # filter only the GeoTIFF files
    tif_files <- tif_files[grepl("^.*\\.tif", tif_files)]

    # select the Product ID requested from list of files
    if (product_id != "all") {
      tif_files <- switch(
        product_id,
        "IDE00420" = {
          tif_files[grepl("IDE00420",
                          tif_files)]
        },
        "IDE00421" = {
          tif_files[grepl("IDE00421",
                          tif_files)]
        },
        "IDE00422" = {
          tif_files[grepl("IDE00422",
                          tif_files)]
        },
        "IDE00423" = {
          tif_files[grepl("IDE00423",
                          tif_files)]
        },
        "IDE00425" = {
          tif_files[grepl("IDE00425",
                          tif_files)]
        },
        "IDE00426" = {
          tif_files[grepl("IDE00426",
                          tif_files)]
        },
        "IDE00427" = {
          tif_files[grepl("IDE00427",
                          tif_files)]
        },
        "IDE00430" = {
          tif_files[grepl("IDE00430",
                          tif_files)]
        },
        "IDE00431" = {
          tif_files[grepl("IDE00431",
                          tif_files)]
        },
        "IDE00432" = {
          tif_files[grepl("IDE00432",
                          tif_files)]
        },
        "IDE00433" = {
          tif_files[grepl("IDE00433",
                          tif_files)]
        },
        "IDE00435" = {
          tif_files[grepl("IDE00435",
                          tif_files)]
        },
        "IDE00436" = {
          tif_files[grepl("IDE00436",
                          tif_files)]
        },
        "IDE00437" = {
          tif_files[grepl("IDE00437",
                          tif_files)]
        },
        tif_files[grepl("IDE00439",
                        tif_files)]
      )
      paste0(bom_server, tif_files)
    } else {
      tif_files
    }

    # check if the Product ID requested provides any files on the server
    if (length(tif_files) == 0 |
        tif_files[1] == "ftp://ftp.bom.gov.au/anon/gen/gms/") {
      stop(paste0("\nSorry, no files are currently available for ", product_id))
    }
    return(tif_files)
  }

  tif_list <- .ftp_images(product_id, bom_server = ftp_base)
  write(tif_list, file = file.path(tempdir(), "tif_list"))
  print(tif_list)

}
AnttiRask commented 1 year ago

@djnavarro, if you end up using that method, here's a fun addition to the example code, for your consideration.

Because if it's a need to use some time from 'yesterday', why not calculate it programmatically, instead of manually:

library(lubridate)
library(tidyverse)

yesterday_9pm <-
  as.character(floor_date(now() - ddays(1), "day") + dhours(21)) %>% 
  str_replace_all("-", "") %>% 
  str_replace_all(":", "") %>% 
  str_replace_all(" ", "") %>%
  str_sub(1, 12)

yesterday_9pm
#> [1] "202302052100"
adamhsparks commented 1 year ago

Thank you, @AnttiRask, I thought I’d addressed that, but must have still had .ftp_images() in my R session when I tested.

AnttiRask commented 1 year ago

@adamhsparks and @djnavarro, I hope you don't mind, but I actually spent some time with the code Adam provided and tidied it a bit (mainly because I wanted to understand better how it works).

What do you think?

# Load the packages needed
library(curl)
library(tidyverse)
# The function
get_available_imagery <- function(product_id = "all") {

  ftp_base <- "ftp://ftp.bom.gov.au/anon/gen/gms/"

  .ftp_images <- function(product_id, bom_server) {

    list_files <- new_handle()

    handle_setopt(
      handle         = list_files,
      CONNECTTIMEOUT = 60L,
      TIMEOUT        = 120L,
      ftp_use_epsv   = TRUE,
      dirlistonly    = TRUE
    )

    # get file list from FTP server
    con <- curl(
      url    = ftp_base,
      open   = "r",
      handle = list_files
    )

    tif_files <- readLines(con)

    close(con)

    # filter only the GeoTIFF files
    tif_files <- tif_files %>%
      as_tibble() %>%
      filter(str_detect(value, "^.*\\.tif")) %>%
      pull()

    # check if the Product ID requested provides any files on the server
    if (length(tif_files) == 0 | tif_files[1] == ftp_base) {
      stop(
        str_c(
          "\nSorry, no files are currently available for ",
          product_id
        )
      )
    }
    return(tif_files)
  }

  tif_list <- .ftp_images(product_id, bom_server = ftp_base)

  write_lines(tif_list, file = file.path(tempdir(), "tif_list"))

  cat("\nThe following files are currently available for download:\n")

  print(tif_list)

}
# The datetime as.character for the upcoming str_subset function
library(lubridate)

yesterday_10pm <-
  as.character(floor_date(now() - ddays(1), "day") + dhours(22)) %>% 
  str_replace_all("-", "") %>% 
  str_replace_all(":", "") %>% 
  str_replace_all(" ", "") %>%
  str_sub(1, 12)

yesterday_10pm

And then the two blocks from the book, modified:

# List of all the filenames with yesterday, 10pm as the date and time
files <- get_available_imagery() %>% 
  str_subset(yesterday_10pm)
# Download the files
walk2(
  .x = str_c("ftp://ftp.bom.gov.au/anon/gen/gms/", files),
  .y = file.path("raster", files),
  .f = ~ download.file(url = .x, destfile = .y)
)
hadley commented 1 year ago

Thanks for working on this, but I don't think we'd want to include that much code in the book, because it's unrelated to the core purpose of drawing a map.

AnttiRask commented 1 year ago

No worries! I totally get that.

Besides, although the files seem to be loading correctly, I ran into other problems when going forward with the code example (no matter which version of that function I used for getting the files).

And in any case, it's been a good learning experience!

adamhsparks commented 1 year ago

I don’t mind at all. I hacked together what was originally multipurpose in bomrang to work here. I’m more than happy for someone to polish it .

On 8 Feb 2023, at 4:17 pm, Antti Rask @.***> wrote:

@adamhsparks https://github.com/adamhsparks and @djnavarro https://github.com/djnavarro, I hope you don't mind, but I actually spent some time with the code Adam provided and tidied it a bit (mainly because I wanted to understand better how it works).

What do you think?

Load the packages needed

library(curl) library(tidyverse)

The function

get_available_imagery <- function(product_id = "all") {

ftp_base <- "ftp://ftp.bom.gov.au/anon/gen/gms/"

.ftp_images <- function(product_id, bom_server) {

list_files <- new_handle()

handle_setopt(
  handle         = list_files,
  CONNECTTIMEOUT = 60L,
  TIMEOUT        = 120L,
  ftp_use_epsv   = TRUE,
  dirlistonly    = TRUE
)

# get file list from FTP server
con <- curl(
  url    = ftp_base,
  open   = "r",
  handle = list_files
)

tif_files <- readLines(con)

close(con)

# filter only the GeoTIFF files
tif_files <- tif_files %>%
  as_tibble() %>%
  filter(str_detect(value, "^.*\\.tif")) %>%
  pull()

# check if the Product ID requested provides any files on the server
if (length(tif_files) == 0 | tif_files[1] == ftp_base) {
  stop(
    str_c(
      "\nSorry, no files are currently available for ",
      product_id
    )
  )
}
return(tif_files)

}

tif_list <- .ftp_images(product_id, bom_server = ftp_base)

write_lines(tif_list, file = file.path(tempdir(), "tif_list"))

cat("\nThe following files are currently available for download:\n")

print(tif_list)

}

The datetime as.character for the upcoming str_subset function

library(lubridate)

yesterday_10pm <- as.character(floor_date(now() - ddays(1), "day") + dhours(22)) %>% str_replace_all("-", "") %>% str_replace_all(":", "") %>% str_replace_all(" ", "") %>% str_sub(1, 12)

yesterday_10pm And then the two blocks from the book, modified:

List of all the filenames with yesterday, 10pm as the date and time

files <- get_available_imagery() %>% str_subset(yesterday_10pm)

Download the files

walk2( .x = str_c("ftp://ftp.bom.gov.au/anon/gen/gms/", files), .y = file.path("raster", files), .f = ~ download.file(url = .x, destfile = .y) ) — Reply to this email directly, view it on GitHub https://github.com/hadley/ggplot2-book/issues/338#issuecomment-1422203499, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAYMIAQD5D2HUUL3UGVMZATWWNJBXANCNFSM6AAAAAATZAZ74M. You are receiving this because you were mentioned.

adamhsparks commented 1 year ago

I (we, my team) are working on a "spiritual successor" to {bomrang} right now at work. We're getting close to putting it here on GitHub after we finalise some more development internally.

Right now, I've planned to incorporate this functionality into the new package. I will keep you updated but hope to have something soon.

djnavarro commented 1 year ago

For the time being, I've simplified the public-facing content so that doesn't say anything about how to obtain geotiff files, and links to the raw github file if anyone wants to use the specific file from the example in the book. Might be worth revisiting once a successor to bomrang is available.

adamhsparks commented 1 year ago

A quick update. The new package is openly available now, https://github.com/DPIRD-FSI/weatherOz. Not on CRAN yet, but this functionality does exist and works, https://dpird-fsi.github.io/weatherOz/reference/get_satellite_imagery.html. I’ll be adding to the functionality soon, support for {terra}, {stars} and raw (matrix). But I do plan to keep the default return as with {bomrang}. So it should drop in here.