rOpenGov / pxweb

R tools to access PX-WEB API
http://ropengov.github.io/pxweb
Other
69 stars 31 forks source link

downloading Latvia data #214

Closed sakaiando closed 3 years ago

sakaiando commented 3 years ago

I wanted to download a table from Latvia statistics office and used the following code,

library(pxweb)

url = 'https://data.stat.gov.lv:443/api/v1/en/OSP_PUB/START/VEK/IS/ISP/ISP050c'
query = '
{
  "query": [],
  "response": {
    "format": "px"
  }
}'

pxq <- pxweb_query(query)
pxd <- pxweb_get(url,pxq)

But I get an error saying

Error: Not all mandatory variables are included in the query.

Could you help me (a novice) figure out how to fix the error?

Package version is 0.9.1. Session info is as below.

R version 3.6.3 (2020-02-29)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
 [1] pxweb_0.9.1     tictoc_1.0      rlist_0.4.6.1   jsonlite_1.6.1  httr_1.4.1      forcats_0.5.0   stringr_1.4.0  
 [8] dplyr_0.8.5     purrr_0.3.3     readr_1.3.1     tidyr_1.0.2     tibble_2.1.3    ggplot2_3.3.0   tidyverse_1.3.0

loaded via a namespace (and not attached):
 [1] DBI_1.1.0         withr_2.1.2       rvest_0.3.5       tidyselect_1.0.0  lattice_0.20-38   pkgconfig_2.0.3  
 [7] utf8_1.1.4        reprex_0.3.0      xml2_1.2.5        compiler_3.6.3    fs_1.3.2          readxl_1.3.1     
[13] Rcpp_1.0.3        cli_2.0.2         cellranger_1.1.0  tools_3.6.3       checkmate_2.0.0   nlme_3.1-144     
[19] broom_0.5.5       R6_2.4.1          dbplyr_1.4.2      scales_1.1.0      curl_4.3          assertthat_0.2.1 
[25] gtable_0.3.0      fansi_0.4.1       stringi_1.4.6     rstudioapi_0.11   backports_1.1.5   hms_0.5.3        
[31] munsell_0.5.0     grid_3.6.3        data.table_1.12.8 lifecycle_0.2.0   colorspace_1.4-1  glue_1.3.2       
[37] lubridate_1.7.4   rlang_0.4.5       magrittr_1.5      generics_0.0.2    vctrs_0.2.4       crayon_1.3.4     
[43] haven_2.2.0       modelr_0.1.6      pillar_1.4.3     
MansMeg commented 3 years ago

Hi!

If you are a novice, I suggest you use the interactive function to build up your query:

d <- interactive_pxweb("https://data.stat.gov.lv:443/api/v1/en/OSP_PUB/START")

Does this work for you?

sakaiando commented 3 years ago

Thanks @MansMeg,

I got the following error message.

> d <- interactive_pxweb("https://data.stat.gov.lv:443/api/v1/en/OSP_PUB/START")
Error in get_api_index(api_name, api_list) : 
  API do not exist in api catalogue.
In addition: Warning messages:
1: 'interactive_pxweb' is deprecated.
Use 'pxweb_interactive' instead.
See help("Deprecated") 
2: '.Object$initialize' is deprecated.
Use 'pxweb' instead.
See help("Deprecated") 

I followed the error message and used pxweb_interactive instead, but now I get a different error as below.

==================================================================================================================
 R PXWEB: Content of 'data.stat.gov.lv'
          at '/api/v1/en/OSP_PUB/START/VEK/IS/ISP/ISP050c'
   TABLE: Expenditure of gross domestic product (thsd euro) by Adjustment, Values, Indicator and Time period
VARIABLE: SESON, VAL, INDICATOR, ContentsCode, [[TIME]]
==================================================================================================================
 [   1 ] : 1995Q1
 [   2 ] : 1995Q2
 [   3 ] : 1995Q3
 [   4 ] : 1995Q4

 [ 100 ] : 2019Q4
 [ 101 ] : 2020Q1
 [ 102 ] : 2020Q2
 [ 103 ] : 2020Q3
==================================================================================================================
Enter one or more choices:
Separate multiple choices by ',' and intervals of choices by ':'
('esc' = Quit, 'b' = Back, '*' = Select all, 'a' = Show all, 'i' = Show id) 
1: *
Do you want to print code to query and download data?
Enter your choice:
('esc' = Quit, 'y' = Yes, 'n' = No) 
1: y
Do you want to print query in json format (otherwise query is printed as an R list)?
Enter your choice:
('esc' = Quit, 'y' = Yes, 'n' = No) 
1: y
Do you want to download the data?
Enter your choice:
('esc' = Quit, 'y' = Yes, 'n' = No) 
1: y
Do you want to return a the data as a data.frame?
Enter your choice:
('esc' = Quit, 'y' = Yes, 'n' = No) 
1: y
Do you want to print citation for the data?
Enter your choice:
('esc' = Quit, 'y' = Yes, 'n' = No) 
1: y

Error in permutations(n = n, v = x, r = n) : bad value of n

FYI, I chose * and y for all the questions. Could you kindly advise how to avoid the error?

Below is session info.

R version 3.6.3 (2020-02-29)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 10 x64 (build 17763)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] RJSONIO_1.3-1.4 pxweb_0.9.1     forcats_0.5.0   stringr_1.4.0   dplyr_0.8.5     purrr_0.3.3     readr_1.3.1    
 [8] tidyr_1.0.2     tibble_2.1.3    ggplot2_3.3.0   tidyverse_1.3.0

loaded via a namespace (and not attached):
 [1] DBI_1.1.0        withr_2.1.2      rvest_0.3.5      tidyselect_1.0.0 lattice_0.20-38  pkgconfig_2.0.3 
 [7] reprex_0.3.0     xml2_1.2.5       compiler_3.6.3   fs_1.3.2         readxl_1.3.1     Rcpp_1.0.3      
[13] cli_2.0.2        cellranger_1.1.0 httr_1.4.1       tools_3.6.3      checkmate_2.0.0  nlme_3.1-144    
[19] broom_0.5.5      R6_2.4.1         dbplyr_1.4.2     scales_1.1.0     curl_4.3         assertthat_0.2.1
[25] gtable_0.3.0     fansi_0.4.1      stringi_1.4.6    rstudioapi_0.11  backports_1.1.5  hms_0.5.3       
[31] munsell_0.5.0    grid_3.6.3       lifecycle_0.2.0  colorspace_1.4-1 glue_1.3.2       lubridate_1.7.4 
[37] rlang_0.4.5      magrittr_1.5     generics_0.0.2   vctrs_0.2.4      crayon_1.3.4     haven_2.2.0     
[43] modelr_0.1.6     pillar_1.4.3     jsonlite_1.6.1  
MansMeg commented 3 years ago

Yes, you should use:

d <- pxweb_interactive("https://data.stat.gov.lv:443/api/v1/en/OSP_PUB/START")

The error is a known one. I think it should be solved if you update to the latest version. For me it works:

=============================================================================
 R PXWEB: Content of 'data.stat.gov.lv'
          at '/api/v1/en/OSP_PUB/START/VEK/IS/ISP/ISP050c'
   TABLE: Expenditure of gross domestic product (thsd euro) by Adjustment, Values, Indicator and Time period
VARIABLE: SESON, VAL, INDICATOR, ContentsCode, [[TIME]]
=============================================================================
 [   1 ] : 1995Q1
 [   2 ] : 1995Q2
 [   3 ] : 1995Q3
 [   4 ] : 1995Q4

 [ 100 ] : 2019Q4
 [ 101 ] : 2020Q1
 [ 102 ] : 2020Q2
 [ 103 ] : 2020Q3
=============================================================================
Enter one or more choices:
Separate multiple choices by ',' and intervals of choices by ':'
('esc' = Quit, 'b' = Back, '*' = Select all, 'a' = Show all, 'i' = Show id) 
1: *
Do you want to print code to query and download data?
Enter your choice:
('esc' = Quit, 'y' = Yes, 'n' = No) 
1: y
Do you want to print query in json format (otherwise query is printed as an R list)?
Enter your choice:
('esc' = Quit, 'y' = Yes, 'n' = No) 
1: n
Do you want to download the data?
Enter your choice:
('esc' = Quit, 'y' = Yes, 'n' = No) 
1: y
Do you want to return a the data as a data.frame?
Enter your choice:
('esc' = Quit, 'y' = Yes, 'n' = No) 
1: n
Do you want to print citation for the data?
Enter your choice:
('esc' = Quit, 'y' = Yes, 'n' = No) 
1: n
  Downloading large query (in 6 batches):
  |=======================================================================| 100%
# PXWEB query 
pxweb_query_list <- 
  list("SESON"=c("NSA","SA"),
       "VAL"=c("CP","KP_PRE","CLV2015"),
       "INDICATOR"=c("B1GQ","P31_S14_S15","P3_S13","P5","P51G","P5M","P6","P7"),
       "ContentsCode"=c("ISP050c"),
       "TIME"=c("1995Q1","1995Q2","1995Q3","1995Q4","1996Q1","1996Q2","1996Q3","1996Q4","1997Q1","1997Q2","1997Q3","1997Q4","1998Q1","1998Q2","1998Q3","1998Q4","1999Q1","1999Q2","1999Q3","1999Q4","2000Q1","2000Q2","2000Q3","2000Q4","2001Q1","2001Q2","2001Q3","2001Q4","2002Q1","2002Q2","2002Q3","2002Q4","2003Q1","2003Q2","2003Q3","2003Q4","2004Q1","2004Q2","2004Q3","2004Q4","2005Q1","2005Q2","2005Q3","2005Q4","2006Q1","2006Q2","2006Q3","2006Q4","2007Q1","2007Q2","2007Q3","2007Q4","2008Q1","2008Q2","2008Q3","2008Q4","2009Q1","2009Q2","2009Q3","2009Q4","2010Q1","2010Q2","2010Q3","2010Q4","2011Q1","2011Q2","2011Q3","2011Q4","2012Q1","2012Q2","2012Q3","2012Q4","2013Q1","2013Q2","2013Q3","2013Q4","2014Q1","2014Q2","2014Q3","2014Q4","2015Q1","2015Q2","2015Q3","2015Q4","2016Q1","2016Q2","2016Q3","2016Q4","2017Q1","2017Q2","2017Q3","2017Q4","2018Q1","2018Q2","2018Q3","2018Q4","2019Q1","2019Q2","2019Q3","2019Q4","2020Q1","2020Q2","2020Q3"))

# Download data 
px_data <- 
  pxweb_get(url = "https://data.stat.gov.lv:443/api/v1/en/OSP_PUB/START/VEK/IS/ISP/ISP050c",
            query = pxweb_query_list)

# Convert to data.frame 
px_data_frame <- as.data.frame(px_data, column.name.type = "text", variable.value.type = "text")

# Get pxweb data comments 
px_data_comments <- pxweb_data_comments(px_data)
px_data_comments_df <- as.data.frame(px_data_comments)

# Cite the data as 
pxweb_cite(px_data)

Although you need to update to the latest version on Github with (you need to install the remotes R package):

remotes::install_github('ropengov/pxweb')
sakaiando commented 3 years ago

Thanks @MansMeg, it worked after updating the version to 0.10.2!

I also tried json query mode and query list mode, and both of them worked well :)

library(tidyverse)
library(pxweb)

##### json query mode #####
url = 'https://data.stat.gov.lv:443/api/v1/en/OSP_PUB/START/VEK/IS/ISP/ISP050c'
query = '{
  "query": [
    {
      "code": "SESON",
      "selection": {
        "filter": "item",
        "values": ["NSA", "SA"]
      }
    },
    {
      "code": "VAL",
      "selection": {
        "filter": "item",
        "values": ["CP", "KP_PRE", "CLV2015"]
      }
    },
    {
      "code": "INDICATOR",
      "selection": {
        "filter": "item",
        "values": ["B1GQ", "P31_S14_S15", "P3_S13", "P5", "P51G", "P5M", "P6", "P7"]
      }
    },
    {
      "code": "ContentsCode",
      "selection": {
        "filter": "item",
        "values": ["ISP050c"]
      }
    },
    {
      "code": "TIME",
      "selection": {
        "filter": "item",
        "values": ["1995Q1", "1995Q2", "1995Q3", "1995Q4", "1996Q1", "1996Q2", "1996Q3", "1996Q4", "1997Q1", "1997Q2", "1997Q3", "1997Q4", "1998Q1", "1998Q2", "1998Q3", "1998Q4", "1999Q1", "1999Q2", "1999Q3", "1999Q4", "2000Q1", "2000Q2", "2000Q3", "2000Q4", "2001Q1", "2001Q2", "2001Q3", "2001Q4", "2002Q1", "2002Q2", "2002Q3", "2002Q4", "2003Q1", "2003Q2", "2003Q3", "2003Q4", "2004Q1", "2004Q2", "2004Q3", "2004Q4", "2005Q1", "2005Q2", "2005Q3", "2005Q4", "2006Q1", "2006Q2", "2006Q3", "2006Q4", "2007Q1", "2007Q2", "2007Q3", "2007Q4", "2008Q1", "2008Q2", "2008Q3", "2008Q4", "2009Q1", "2009Q2", "2009Q3", "2009Q4", "2010Q1", "2010Q2", "2010Q3", "2010Q4", "2011Q1", "2011Q2", "2011Q3", "2011Q4", "2012Q1", "2012Q2", "2012Q3", "2012Q4", "2013Q1", "2013Q2", "2013Q3", "2013Q4", "2014Q1", "2014Q2", "2014Q3", "2014Q4", "2015Q1", "2015Q2", "2015Q3", "2015Q4", "2016Q1", "2016Q2", "2016Q3", "2016Q4", "2017Q1", "2017Q2", "2017Q3", "2017Q4", "2018Q1", "2018Q2", "2018Q3", "2018Q4", "2019Q1", "2019Q2", "2019Q3", "2019Q4", "2020Q1", "2020Q2", "2020Q3"]
      }
    }
  ],
  "response": {
    "format": "json"
  }
} '

## the query from the statistics office website is not sufficient
# query = '
# {
#   "query": [],
#   "response": {
#     "format": "px"
#   }
# }'

pxq <- pxweb_query(query)
pxweb_get(url,pxq) %>%
  as.data.frame(column.name.type = "text", variable.value.type = "text") ->
  df1

##### query list mode #####
pxweb_query_list <- 
  list("SESON"=c("*"),
       "VAL"=c("*"),
       "INDICATOR"=c("*"),
       "ContentsCode"=c("*"),
       "TIME"=c("*"))

pxweb_get(url = "https://data.stat.gov.lv:443/api/v1/en/OSP_PUB/START/VEK/IS/ISP/ISP050c",
          query = pxweb_query_list) %>%
  as.data.frame(column.name.type = "text", variable.value.type = "text") ->
  df2

all.equal(df1,df2)
MansMeg commented 3 years ago

Great!