ropensci / nomisr

Access UK official statistics from the Nomis database through R.
https://docs.ropensci.org/nomisr
Other
44 stars 12 forks source link

Error with SIC 5-digit 2007 concept #33

Open stephenjarvis opened 1 year ago

stephenjarvis commented 1 year ago

Hi there,

Great package so thanks for making this.

Unfortunately I have run into an issue trying to pull the industry codes for certain datasets. My R code for this has worked previously, and still seems to work for some datasets that involve industry-level data, but not for others. It appears to be an issue that is specific to the 5-digit SIC 2007 industry category because I get this error message when checking the industry concept for NM_189_1 and NM_187_1, but not for earlier industry classifications such as in NM_187_1. Other concepts from these datasets return the correct info no problem. Copy of the error message recieved is below. nomisr

Many thanks,

Stephen

evanodell commented 1 year ago

Can you paste a copy of the code that generates this error? And the actual error message if possible?

stephenjarvis commented 1 year ago

Example code here:

library(tidyverse) library(nomisr)

nomis_get_metadata(id = "NM_187_1", concept = "INDUSTRY") # <- this one works fine nomis_get_metadata(id = "NM_189_1", concept = "INDUSTRY") # <- this one doesn't work nomis_get_metadata(id = "NM_142_1", concept = "INDUSTRY") # <- this one also doesn't work

Error message looks like this: <simpleError in if ((attr(regexpr("<!DOCTYPE html>", content), "match.length") == -1) && (attr(regexpr("", content), "match.length") == -1)) { BOM <- "<U+FEFF>" if (attr(regexpr(BOM, content), "match.length") != -1) { content <- gsub(BOM, "", content) } content <- gsub("", "", content) content <- gsub("“", """, content) content <- gsub("”", """, content) content <- gsub("‘", "'", content) content <- gsub("’", "'", content) xmlObj <- xmlTreeParse(content, useInternalNodes = TRUE) status <- 1} else { stop("Invalid SDMX-ML file")}: missing value where TRUE/FALSE needed>

A tibble: 0 x 0

evanodell commented 1 year ago

Thanks. From the look of things there may be an issue with how Nomis is returning these XML files, or with the rsdmx package that reads them (sdmx being a standard format for this kind of data). I will keep looking for a fix in the meantime - possibly include some experimental support for sdmx-json as an optional fallback if the sdmx request fails.

DanOlner commented 1 year ago

Just dropping in to say I'm having the same issue still, specifically with NM_189_1 (The BRES data - same result for BRES "excluding PAYE only" NM_172_1). All other concepts return tibbles, "INDUSTRY" is broken.

Happy to pester NOMIS if there's something in the API not working at their end?

p.s. I did successfully get the full BRES data for NUTS2 (i.e. new ITL2) level data, it gave me all the industry data for 5,3 and 2 digit SIC codes fine, >1M records (note, with API key included):

z <- nomis_get_data(id = "NM_189_1", time = "latest", geography = "TYPE438")

p.s. NOMISR has just enabled me to bulk download all BRES open data 2009-2021 with no throttling or other hitch. Insanely useful timesaver, thank you!