rOpenGov / eurostat

R tools for Eurostat data
http://ropengov.github.io/eurostat
Other
234 stars 46 forks source link

Issues with function label_eurostat #290

Closed Snehal-Rajwar closed 7 months ago

Snehal-Rajwar commented 7 months ago

I have been going in circles about it, the new package has been continuously having issues with labelling . Multiple columns such as partner,nrg_bal etc even geo that is location seem to not find all the matches . It's breaking the existing codes and sequences extensively. I was wondering if there would be any resolutions about this particular function soon. Let me know if I am missing somethin error g from my end, but i believe its just a simple function that should not cause the error.

pitkant commented 7 months ago

Thank you for opening this issue @Snehal-Rajwar

1) What dataset were you trying to label exactly? I assumed it was nrg_cb_oil and tried to replicate your error - everything seemed to work fine for me:

> nrg <- get_eurostat("nrg_cb_oil")
trying URL 'https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/nrg_cb_oil?format=TSV&compressed=true'
downloaded 3.6 MB

Table nrg_cb_oil cached at /var/folders/f4/h_r3y60n0nn0qm6qx5hnx1s00000gn/T//Rtmp3ypzad/eurostat/4ef7e150263290f09e117c721782a317.rds

> nrg_l <- label_eurostat(nrg)

All countries (geo codes) labeled correctly:

> unique(nrg_l$geo)
 [1] "Albania"                                  
 [2] "Austria"                                  
 [3] "Bosnia and Herzegovina"                   
 [4] "Belgium"                                  
 [5] "Bulgaria"                                 
 [6] "Cyprus"                                   
 [7] "Czechia"                                  
 [8] "Germany"                                  
 [9] "Denmark"                                  
[10] "Euro area – 20 countries (from 2023)"     
[11] "Estonia"                                  
[12] "Greece"                                   
[13] "Spain"                                    
[14] "European Union - 27 countries (from 2020)"
[15] "Finland"                                  
[16] "France"                                   
[17] "Georgia"                                  
[18] "Croatia"                                  
[19] "Hungary"                                  
[20] "Ireland"                                  
[21] "Iceland"                                  
[22] "Italy"                                    
[23] "Liechtenstein"                            
[24] "Lithuania"                                
[25] "Luxembourg"                               
[26] "Latvia"                                   
[27] "Moldova"                                  
[28] "Montenegro"                               
[29] "North Macedonia"                          
[30] "Malta"                                    
[31] "Netherlands"                              
[32] "Norway"                                   
[33] "Poland"                                   
[34] "Portugal"                                 
[35] "Romania"                                  
[36] "Serbia"                                   
[37] "Sweden"                                   
[38] "Slovenia"                                 
[39] "Slovakia"                                 
[40] "Türkiye"                                  
[41] "Ukraine"                                  
[42] "United Kingdom"                           
[43] "Kosovo*"    

2) Are you running 4.0.0 version of the package? Could you post your sessionInfo()?

Snehal-Rajwar commented 7 months ago

image image These are the packages the columns with labelling issue are partner, nrg_bal. Yes i am running 4.0.0. Let me know what you see ,if it the problem with the package

pitkant commented 7 months ago

I tried downloading and labelling the first 2 datasets from your example and encountered no problems, warning messages or errors:

> nrg_cb_oil <- get_eurostat("nrg_cb_oilm")
trying URL 'https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/nrg_cb_oilm?format=TSV&compressed=true'
downloaded 2.0 MB

Table nrg_cb_oilm cached at /var/folders/f4/h_r3y60n0nn0qm6qx5hnx1s00000gn/T//Rtmpp1ZfUP/eurostat/66a8a5a0de29b6c28bc55a6fa8718dc5.rds

> nrg_cb_oil_l <- label_eurostat(nrg_cb_oil)

> nrg_cb_stk <- get_eurostat("nrg_stk_oilm")
trying URL 'https://ec.europa.eu/eurostat/api/dissemination/sdmx/2.1/data/nrg_stk_oilm?format=TSV&compressed=true'
downloaded 3.5 MB

Table nrg_stk_oilm cached at /var/folders/f4/h_r3y60n0nn0qm6qx5hnx1s00000gn/T//Rtmpp1ZfUP/eurostat/fe4d35a1fe401f31c894e6af39de2f4d.rds

> nrg_cb_stk_l <- label_eurostat(nrg_cb_stk)

When comparing our sessionInfo I notice that you are running slightly older versions of R packages and 2 years older version or R. Here's my sessionInfo for reference:

> sessionInfo()
R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.2.1

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Helsinki
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] eurostat_4.0.0

loaded via a namespace (and not attached):
 [1] tidyr_1.3.0        rappdirs_0.3.3     utf8_1.2.3         generics_0.1.3    
 [5] class_7.3-22       xml2_1.3.6         KernSmooth_2.23-22 stringi_1.7.12    
 [9] hms_1.1.3          digest_0.6.33      magrittr_2.0.3     countrycode_1.5.0 
[13] timechange_0.2.0   ISOweek_0.6-2      cellranger_1.1.0   rprojroot_2.0.3   
[17] plyr_1.8.9         jsonlite_1.8.7     e1071_1.7-13       backports_1.4.1   
[21] httr_1.4.7         purrr_1.0.2        fansi_1.0.5        regions_0.1.8     
[25] bibtex_0.5.1       httr2_0.2.3        cli_3.6.2          rlang_1.1.3       
[29] tools_4.3.2        tzdb_0.4.0         dplyr_1.1.3        here_1.0.1        
[33] curl_5.2.0         assertthat_0.2.1   vctrs_0.6.4        R6_2.5.1          
[37] proxy_0.4-27       lifecycle_1.0.3    lubridate_1.9.3    classInt_0.4-10   
[41] RefManageR_1.4.0   stringr_1.5.0      pkgconfig_2.0.3    pillar_1.9.0      
[45] data.table_1.14.10 glue_1.6.2         Rcpp_1.0.11        tibble_3.2.1      
[49] tidyselect_1.2.0   rstudioapi_0.15.0  readr_2.1.4        compiler_4.3.2    
[53] readxl_1.4.3  

From this information I can't say what the point of failure is when you're attempting to label things. Generally speaking, and I'm not saying that my packages are as up-to-date as they can but still, it may help to update your packages and / or R.

Snehal-Rajwar commented 7 months ago

Hi Thanks ,I didn't realise my computer was updating it once i downloaded it .It did solve it for the first two datasets but I am still having labelling problem with others for partner column. Can you check those datasets as well image

my current session info image Really appreciate the help. Thanks!

pitkant commented 7 months ago

Thank you for the update. I tried debugging the code and to me it seems that the only labels that the function is not able to label are "NA" items. Internally unlabelled codes are saves in variable x and labelled titles are saved in variable y and from that we get (while in debug(label_eurostat):

# The positions of NA items in y
head(which(is.na(y)))
[1] 318979 318980 318981 318982 318983 318984

# The number of NA items (length of which(is.na(y)) )
length(which(is.na(y)))
[1] 37665

# the contents of x indexes where y is NA
head(x[which(is.na(y))])
[1] "NA" "NA" "NA" "NA" "NA" "NA"

# unique items
unique(x[which(is.na(y))])
[1] "NA"

So to me it seems that while the warning message may be a bit alarming, the function mainly works as it should.

> unique(nrg_ti_trade$partner)
  [1] "AD"          "AE"          "AFR_OTH"     "AL"         
  [5] "AM"          "AME_LAT"     "AME_OTH"     "AN"         
  [9] "AO"          "AR"          "ASI_NME"     "ASI_NME_OTH"
 [13] "ASI_OTH"     "AT"          "AU"          "AW"         
 [17] "AZ"          "BA"          "BB"          "BD"         
 [21] "BE"          "BG"          "BH"          "BJ"         
 [25] "BN"          "BO"          "BR"          "BS"         
 [29] "BY"          "BZ"          "CA"          "CD"         
 [33] "CG"          "CH"          "CI"          "CL"         
 [37] "CM"          "CN"          "CN_X_HK"     "CO"         
 [41] "CR"          "CU"          "CV"          "CW"         
 [45] "CY"          "CZ"          "DE"          "DJ"         
 [49] "DK"          "DO"          "DZ"          "EC"         
 [53] "EE"          "EG"          "EL"          "ER"         
 [57] "ES"          "ET"          "EU27_2020"   "EU28"       
 [61] "EUR_OTH"     "EX_SU_OTH"   "FI"          "FR"         
 [65] "GA"          "GE"          "GH"          "GI"         
 [69] "GQ"          "GT"          "GW"          "HK"         
 [73] "HN"          "HR"          "HU"          "ID"         
 [77] "IE"          "IL"          "IN"          "IQ"         
 [81] "IR"          "IS"          "IT"          "JM"         
 [85] "JO"          "JP"          "KE"          "KG"         
 [89] "KH"          "KP"          "KR"          "KW"         
 [93] "KZ"          "LA"          "LB"          "LI"         
 [97] "LK"          "LR"          "LT"          "LU"         
[101] "LV"          "LY"          "MA"          "MD"         
[105] "ME"          "MG"          "MH"          "MK"         
[109] "MM"          "MN"          "MR"          "MT"         
[113] "MU"          "MX"          "MY"          "MZ"         
[117] "NA"          "NC"          "NE"          "NG"         
[121] "NL"          "NO"          "NP"          "NSP"        
[125] "NZ"          "OM"          "PA"          "PE"         
[129] "PG"          "PH"          "PK"          "PL"         
[133] "PT"          "QA"          "RO"          "RS"         
[137] "RU"          "SA"          "SD"          "SE"         
[141] "SG"          "SI"          "SK"          "SL"         
[145] "SN"          "SS"          "ST"          "SY"         
[149] "TG"          "TH"          "TJ"          "TL"         
[153] "TM"          "TN"          "TOTAL"       "TR"         
[157] "TT"          "TW"          "TZ"          "UA"         
[161] "UG"          "UK"          "US"          "UY"         
[165] "UZ"          "VE"          "VG"          "VN"         
[169] "XK"          "YE"          "ZA"          "EX_YU_OTH"  
> unique(nrg_ti_trade_l$partner)
  [1] "Andorra"                                                                
  [2] "United Arab Emirates"                                                   
  [3] "Other African countries (aggregate changing according to the context)"  
  [4] "Albania"                                                                
  [5] "Armenia"                                                                
  [6] "Latin American countries"                                               
  [7] "Other American countries (aggregate changing according to the context)" 
  [8] "Netherlands Antilles"                                                   
  [9] "Angola"                                                                 
 [10] "Argentina"                                                              
 [11] "Near and Middle East Asia (aggregate changing according to the context)"
 [12] "Other Near and Middle East Asian countries"                             
 [13] "Other Asian countries (aggregate changing according to the context)"    
 [14] "Austria"                                                                
 [15] "Australia"                                                              
 [16] "Aruba"                                                                  
 [17] "Azerbaijan"                                                             
 [18] "Bosnia and Herzegovina"                                                 
 [19] "Barbados"                                                               
 [20] "Bangladesh"                                                             
 [21] "Belgium"                                                                
 [22] "Bulgaria"                                                               
 [23] "Bahrain"                                                                
 [24] "Benin"                                                                  
 [25] "Brunei Darussalam"                                                      
 [26] "Bolivia"                                                                
 [27] "Brazil"                                                                 
 [28] "Bahamas"                                                                
 [29] "Belarus"                                                                
 [30] "Belize"                                                                 
 [31] "Canada"                                                                 
 [32] "Democratic Republic of the Congo"                                       
 [33] "Congo"                                                                  
 [34] "Switzerland"                                                            
 [35] "Côte d’Ivoire"                                                          
 [36] "Chile"                                                                  
 [37] "Cameroon"                                                               
 [38] "China"                                                                  
 [39] "China except Hong Kong"                                                 
 [40] "Colombia"                                                               
 [41] "Costa Rica"                                                             
 [42] "Cuba"                                                                   
 [43] "Cabo Verde"                                                             
 [44] "Curaçao"                                                                
 [45] "Cyprus"                                                                 
 [46] "Czechia"                                                                
 [47] "Germany"                                                                
 [48] "Djibouti"                                                               
 [49] "Denmark"                                                                
 [50] "Dominican Republic"                                                     
 [51] "Algeria"                                                                
 [52] "Ecuador"                                                                
 [53] "Estonia"                                                                
 [54] "Egypt"                                                                  
 [55] "Greece"                                                                 
 [56] "Eritrea"                                                                
 [57] "Spain"                                                                  
 [58] "Ethiopia"                                                               
 [59] "European Union - 27 countries (from 2020)"                              
 [60] "European Union - 28 countries (2013-2020)"                              
 [61] "Other European countries (aggregate changing according to the context)" 
 [62] "Other countries of former Soviet Union (before 1991)"                   
 [63] "Finland"                                                                
 [64] "France"                                                                 
 [65] "Gabon"                                                                  
 [66] "Georgia"                                                                
 [67] "Ghana"                                                                  
 [68] "Gibraltar"                                                              
 [69] "Equatorial Guinea"                                                      
 [70] "Guatemala"                                                              
 [71] "Guinea-Bissau"                                                          
 [72] "Hong Kong"                                                              
 [73] "Honduras"                                                               
 [74] "Croatia"                                                                
 [75] "Hungary"                                                                
 [76] "Indonesia"                                                              
 [77] "Ireland"                                                                
 [78] "Israel"                                                                 
 [79] "India"                                                                  
 [80] "Iraq"                                                                   
 [81] "Iran"                                                                   
 [82] "Iceland"                                                                
 [83] "Italy"                                                                  
 [84] "Jamaica"                                                                
 [85] "Jordan"                                                                 
 [86] "Japan"                                                                  
 [87] "Kenya"                                                                  
 [88] "Kyrgyzstan"                                                             
 [89] "Cambodia"                                                               
 [90] "North Korea"                                                            
 [91] "South Korea"                                                            
 [92] "Kuwait"                                                                 
 [93] "Kazakhstan"                                                             
 [94] "Laos"                                                                   
 [95] "Lebanon"                                                                
 [96] "Liechtenstein"                                                          
 [97] "Sri Lanka"                                                              
 [98] "Liberia"                                                                
 [99] "Lithuania"                                                              
[100] "Luxembourg"                                                             
[101] "Latvia"                                                                 
[102] "Libya"                                                                  
[103] "Morocco"                                                                
[104] "Moldova"                                                                
[105] "Montenegro"                                                             
[106] "Madagascar"                                                             
[107] "Marshall Islands"                                                       
[108] "North Macedonia"                                                        
[109] "Myanmar/Burma"                                                          
[110] "Mongolia"                                                               
[111] "Mauritania"                                                             
[112] "Malta"                                                                  
[113] "Mauritius"                                                              
[114] "Mexico"                                                                 
[115] "Malaysia"                                                               
[116] "Mozambique"                                                             
[117] NA                                                                       
[118] "New Caledonia"                                                          
[119] "Niger"                                                                  
[120] "Nigeria"                                                                
[121] "Netherlands"                                                            
[122] "Norway"                                                                 
[123] "Nepal"                                                                  
[124] "Not specified"                                                          
[125] "New Zealand"                                                            
[126] "Oman"                                                                   
[127] "Panama"                                                                 
[128] "Peru"                                                                   
[129] "Papua New Guinea"                                                       
[130] "Philippines"                                                            
[131] "Pakistan"                                                               
[132] "Poland"                                                                 
[133] "Portugal"                                                               
[134] "Qatar"                                                                  
[135] "Romania"                                                                
[136] "Serbia"                                                                 
[137] "Russia"                                                                 
[138] "Saudi Arabia"                                                           
[139] "Sudan"                                                                  
[140] "Sweden"                                                                 
[141] "Singapore"                                                              
[142] "Slovenia"                                                               
[143] "Slovakia"                                                               
[144] "Sierra Leone"                                                           
[145] "Senegal"                                                                
[146] "South Sudan"                                                            
[147] "São Tomé and Príncipe"                                                  
[148] "Syria"                                                                  
[149] "Togo"                                                                   
[150] "Thailand"                                                               
[151] "Tajikistan"                                                             
[152] "Timor-Leste"                                                            
[153] "Turkmenistan"                                                           
[154] "Tunisia"                                                                
[155] "Total"                                                                  
[156] "Türkiye"                                                                
[157] "Trinidad and Tobago"                                                    
[158] "Taiwan"                                                                 
[159] "Tanzania"                                                               
[160] "Ukraine"                                                                
[161] "Uganda"                                                                 
[162] "United Kingdom"                                                         
[163] "United States"                                                          
[164] "Uruguay"                                                                
[165] "Uzbekistan"                                                             
[166] "Venezuela"                                                              
[167] "British Virgin Islands"                                                 
[168] "Viet Nam"                                                               
[169] "Kosovo*"                                                                
[170] "Yemen"                                                                  
[171] "South Africa"                                                           
[172] "Other countries of former Yugoslavia (before 1992)"     
Snehal-Rajwar commented 7 months ago

Thank you that worked fine.Also thanks for the help and more info on the package.