rOpenGov / pxweb

R tools to access PX-WEB API
http://ropengov.github.io/pxweb
Other
69 stars 31 forks source link

Download both code and name #167

Closed christianlindell closed 5 years ago

christianlindell commented 5 years ago

In previous version pxweb downloaded both code and names. If I downloaded population in Swedish municipalities the names of the municipalities would be for example “1280 Malmö”. Now the name is “Malmö”. This behaviour breaks a lot of old code for me because without the code part of the name it is impossible to see wich county a municipality belongs to.

MansMeg commented 5 years ago

Hmm. Strange. The only difference is that I connect using JSON instead of csv. So that would mean that Statistics Sweden supplies different variable labels.

Does using "code" or "text" in as.data.frame() solve it (i.e. column.name.type = "code", variable.value.type = "code"). Otherwise do you have a reproducible example so I could debug it?

christianlindell commented 5 years ago

It seems like I have to choose between either get codes or names, but not both like in the old version:

library(pxweb) # previous version

myDataSetName <- 
    get_pxweb_data(url = "http://api.scb.se/OV0104/v1/doris/sv/ssd/BE/BE0101/BE0101A/BefolkningNy",
                   dims = list(Region = c('2584'),
                               Civilstand = c('*'),
                               Alder = c('1'),
                               Kon = c('1'),
                               ContentsCode = c('BE0101N1'),
                               Tid = c('2017')),
                   clean = TRUE)

myDataSetName

region civilstånd ålder kön år tabellinnehåll values 2584 Kiruna ogifta 1 år män 2017 Folkmängd 144 2584 Kiruna gifta 1 år män 2017 Folkmängd 0 2584 Kiruna skilda 1 år män 2017 Folkmängd 0 42584 Kiruna änkor/änklingar 1 år män 2017 Folkmängd 0

library(pxweb) # new version
pxweb_query_list <- 
    list("Region"=c("2584"),
         "Civilstand"=c("OG","G","SK","ÄNKL"),
         "Alder"=c("1"),
         "Kon"=c("1"),
         "ContentsCode"=c("BE0101N1"),
         "Tid"=c("2017"))

# Download data 
px_data <- 
    pxweb_get(url = "http://api.scb.se/OV0104/v1/doris/sv/ssd/BE/BE0101/BE0101A/BefolkningNy",
              query = pxweb_query_list)

# Convert to data.frame 
as.data.frame(px_data, column.name.type = "text", variable.value.type = "code")

region civilstånd ålder kön år Folkmängd 2584 OG 1 1 2017 144 2584 G 1 1 2017 0 2584 SK 1 1 2017 0 2584 ÄNKL 1 1 2017 0

# Convert to data.frame 
as.data.frame(px_data, column.name.type = "text", variable.value.type = "text")

region civilstånd ålder kön år Folkmängd Kiruna ogifta 1 år män 2017 144 Kiruna gifta 1 år män 2017 0 Kiruna skilda 1 år män 2017 0 Kiruna änkor/änklingar 1 år män 2017 0

christianlindell commented 5 years ago

An ugly solution, but it works...

(I just recognised that it is possible to skip dimensions in the new version. That is great news! Thanks!)

df <- as.data.frame(px_data_2, column.name.type = "text", variable.value.type = "text", stringsAsFactors = F)
df$kom_kod  <- unlist(as.data.frame(px_data_2, column.name.type = "text", variable.value.type = "code", stringsAsFactors = F)[1], use.names = F)
df

region civilstånd ålder kön år Folkmängd kom_kod 1 Kiruna ogifta 1 år män 2017 144 2584 2 Kiruna gifta 1 år män 2017 0 2584 3 Kiruna skilda 1 år män 2017 0 2584 4 Kiruna änkor/änklingar 1 år män 2017 0 2584

MansMeg commented 5 years ago

Great!

christianlindell commented 5 years ago

Maybe someone will find this usefull.

The new version of pxweb breaks a lot of code for me because it doesn't return both code and names for regions when downloading data from Statistics Sweden. A solution to maintain compability with old code is to define a new version of get_pxweb_data():

library(pxweb)

# Define a new version of get_pxweb_data, the function to get data that the
# previous version of pxweb used

get_pxweb_data <- function(url, dims, clean) {
    clean = NULL
    pxweb_query_list <- dims

    # Download data
    px_data <-
        pxweb_get(url = url,
                  query = pxweb_query_list)
    df <- as.data.frame(px_data,
                        column.name.type = "text",
                        variable.value.type = "text",
                        stringsAsFactors = F)

    # Add codes to names

    if ("region" %in% names(df)) {
        df$code <- unlist(as.data.frame(px_data,
                                        column.name.type = "text",
                                        variable.value.type = "code",
                                        stringsAsFactors = F)["region"],
                          use.names = F)
        df$region <- paste(df$code, df$region)
        df$code <- NULL
    }

    return(df)
}

# Old code for previous version of pxweb
df <- get_pxweb_data(url = "http://api.scb.se/OV0104/v1/doris/sv/ssd/BE/BE0101/BE0101A/BefolkningNy",
               dims = list(Region = c('2584'),
                           Civilstand = c('*'),
                           Alder = c('1'),
                           Kon = c('1'),
                           ContentsCode = c('BE0101N1'),
                           Tid = c('2017')),
               clean = TRUE)

df

region civilstånd ålder kön år Folkmängd 1 2584 Kiruna ogifta 1 år män 2017 144 2 2584 Kiruna gifta 1 år män 2017 0 3 2584 Kiruna skilda 1 år män 2017 0 4 2584 Kiruna änkor/änklingar 1 år män 2017 0

MansMeg commented 5 years ago

Thank you! I guess others may have similar problems.

MansMeg commented 5 years ago

Just a short question. Did your code actually break with the new pxweb package or did you get a warning that the old function was deprecated?

christianlindell commented 5 years ago

The old code didn't work and produced an error:

Loading required namespace: data.table Error in [.data.frame(x, i, j) : object '.SD' not found In addition: Warning messages: 1: 'get_pxweb_data' is deprecated. Use 'pxweb_get_data' instead. See help("Deprecated") 2: 'get_pxweb_metadata' is deprecated. Use 'pxweb_get' instead. See help("Deprecated") 3: 'api_parameters' is deprecated. Use 'pxweb_api_catalogue' instead. See help("Deprecated") 4: In dir.create(temp_api_folder_path(), recursive = TRUE) : 'C:\Users\chris\AppData\Local\Temp\RtmpwT8V5T\pxweb' already exists 5: 'get_pxweb_metadata' is deprecated. Use 'pxweb_get' instead. See help("Deprecated") 6: 'get_pxweb_dims' is deprecated. Use 'pxweb_advanced_get' instead. See help("Deprecated") 7: 'api_parameters' is deprecated. Use 'pxweb_api_catalogue' instead. See help("Deprecated") 8: 'get_pxweb_dims' is deprecated. Use 'pxweb_advanced_get' instead. See help("Deprecated")

OS: Windows 10, R 3.4.4, RStudio 1.2.1114. EDIT: I'm realised that I'm using Microsoft R Open 3.4.4, not standard R. Maybe that has something to do with the problem?

MansMeg commented 5 years ago

Hmmm. That should not happen. Could you supply a reproducible example? I want to fix this asap. My intentions was that it should only throw a warning, not fail.

christianlindell commented 5 years ago
library(pxweb) # version 0.8.3

myDataSetName <- 
    get_pxweb_data(url = "http://api.scb.se/OV0104/v1/doris/sv/ssd/BE/BE0101/BE0101A/BefolkningNy",
                   dims = list(Region = c('2584'),
                               Civilstand = c('*'),
                               Alder = c('1'),
                               Kon = c('1'),
                               ContentsCode = c('BE0101N1'),
                               Tid = c('2017')),
                   clean = TRUE)

Output:

Error in [.data.frame(x, i, j) : object '.SD' not found In addition: Warning messages: 1: 'get_pxweb_data' is deprecated. Use 'pxweb_get_data' instead. See help("Deprecated") 2: 'get_pxweb_metadata' is deprecated. Use 'pxweb_get' instead. See help("Deprecated") 3: 'get_pxweb_metadata' is deprecated. Use 'pxweb_get' instead. See help("Deprecated") 4: 'get_pxweb_dims' is deprecated. Use 'pxweb_advanced_get' instead. See help("Deprecated") 5: 'api_parameters' is deprecated. Use 'pxweb_api_catalogue' instead. See help("Deprecated") 6: 'get_pxweb_dims' is deprecated. Use 'pxweb_advanced_get' instead. See help("Deprecated")

christianlindell commented 5 years ago

Same error on two differnet machines, both running Windows 10 with R 3.4.4 and R 3.4.3.

MansMeg commented 5 years ago

Thanks! Ill reooen this so I can fix it for future users. Sorry for the inconvinience.

MansMeg commented 5 years ago

Now I think I have fixed so the old code should work. Seem to be something strange in the data.table package that did not work if not imported. The above example is inclded as a test case. You can install the latest version (0.9.1) from the master.