nhs-r-community / NHSRpostcodetools

Package to work with England Postcodes in R
https://nhs-r-community.github.io/NHSRpostcodetools/
Other
4 stars 0 forks source link

Error: in seq_len(ceiling(length(x)/batch_size)) : argument must be coercible to non-negative integer #31

Open Lextuga007 opened 3 days ago

Lextuga007 commented 3 days ago

From the Health Inequalities book:

library(tidyverse)
library(PostcodesioR)
library(NHSRpostcodetools) # installed from GitHub not CRAN
library(NHSRpopulation) # installed from GitHub not CRAN
library(janitor)
#> 
#> Attaching package: 'janitor'
#> The following objects are masked from 'package:stats':
#> 
#>     chisq.test, fisher.test

# Generate random example postcodes
# Restricted to NG postcodes from Nottinghamshire because postcodes are drawn
# from all nations and don't validate within the {NHSRpopulation} package
# currently
postcodes <- purrr::map_chr(
  1:10,
  .f = ~ PostcodesioR::random_postcode("NG16") |>
    purrr::pluck(1)
)

# Create a tibble
tibble_postcodes <- dplyr::tibble(
  random_postcodes = postcodes,
)

NHSRpopulation::get_data(tibble_postcodes,
  column = "random_postcodes",
  url_type = "imd"
) |>
  dplyr::select(
    random_postcodes,
    new_postcode,
    imd_decile,
    imd_rank,
    imd_score
  ) |>
  mutate(imd_decile_local = ntile(-imd_score, n = 10)) # creating new deciles from the data provided
#> Joining with `by = join_by(random_postcodes)`
#> Error in seq_len(ceiling(length(x)/batch_size)): argument must be coercible to non-negative integer
Created on 2024-10-21 with [reprex v2.1.1](https://reprex.tidyverse.org/)

And a simplified code:

NHSRpostcodetools::get_data(tibble_postcodes)
#> Error: 'get_data' is not an exported object from 'namespace:NHSRpostcodetools'
francisbarton commented 3 days ago

Hi @Lextuga007 I am pretty sure this will be resulting from the fact that the IMD data endpoint used in NHSRpopulation::api_url no longer exists (or rather, it is currently reporting a "Token required" error message). Let me know if you would like me to share some alternative code for getting the IMD data. My suggestion would be that the IMD 2019 table should be stored as static internal package data in the package, as it is fixed data and doesn't need to be retrieved fresh from an API each time it is needed.

Lextuga007 commented 3 days ago

Ironically that's how this package started 😂 with a static dataset! @milanwiedemann may find this point of interest!

francisbarton commented 3 days ago

It's a bit annoying because the whole point of a data API endpoint is they're not supposed to just disappear! But I am sure there are reasons why they sometimes do.

This is what I'm using in my package here instead now:

imd_lookup <- function() {
  # https://www.gov.uk/government/statistics/english-indices-of-deprivation-2019
  base <- "https://assets.publishing.service.gov.uk/media/"
  fold <- "5d8b3abded915d0373d3540f/"
  file <- "File_1_-_IMD2019_Index_of_Multiple_Deprivation.xlsx"

  paste0(base, fold, file) |>
    openxlsx2::read_xlsx(sheet = "IMD2019") |>
    tibble::as_tibble() |>
    janitor::clean_names() |>
    dplyr::rename_with(\(x) c(
      "lsoa11cd",
      "lsoa11nm",
      "lad19cd",
      "lad19nm",
      "imd_rank",
      "imd_decile"
    )) |>
    dplyr::mutate(across(starts_with("imd"), as.integer)) |>
    dplyr::mutate(across("imd_decile", as.factor))
}

imd_lookup <- imd_lookup()
usethis::use_data(imd_lookup, overwrite = TRUE)