r-quantities / units

Measurement units for R
https://r-quantities.github.io/units
173 stars 27 forks source link

Non-English latin characters doesn't work in symbol name #307

Closed StaffanBetner closed 2 years ago

StaffanBetner commented 2 years ago

I am trying to specify a non-decimal currency system with four parts in units, but I am having troubles with some non-English character (although standard ASCII, I think!). I am on Windows 10 using R 4.1.2.

library(magrittr)
library(units)
#> udunits database from ##RECACTED##/Program/R/R-4.1.2/library/units/share/udunits/udunits2.xml

# 1 mark = 8 öre = 24 örtugar = 192 penningar
#          1 öre = 3  örtugar = 24  penningar
#                  1  örtug   = 8   penningar

install_unit("penningar")
set_units(192, "penningar")
#> 192 [penningar]

install_unit("mark", "192 penningar")
set_units(192, "penningar") %>% set_units("mark")
#> 1 [mark]

# try oe instead of ö
install_unit("oere", "24 penningar")
set_units(192, "penningar") %>% set_units("mark") %>% set_units("oere") # works
#> 8 [oere]

install_unit("öre", "24 penningar")
set_units(192, "penningar") %>% set_units("mark") %>% set_units("öre") # doesn't work
#> Error: In 'öre', 'öre' is not recognized by udunits.
#> 
#> See a table of valid unit symbols and names with valid_udunits().
#> Custom user-defined units can be added with install_unit().
#> 
#> See a table of valid unit prefixes with valid_udunits_prefixes().
#> Prefixes will automatically work with any user-defined unit.
set_units(192, "penningar") %>% set_units("mark") %>% set_units("oere") # now doesn't work either
#> Error: cannot convert mark into öre

Created on 2022-03-09 by the reprex package (v2.0.1)

edzer commented 2 years ago

For cases like these, we need your sessionInfo() since it seems to depend on platform, whether it is UTF8 etc. On ubuntu I get this:

library(magrittr)
library(units)
# udunits database from /usr/share/xml/udunits/udunits2.xml
sessionInfo()
# R version 4.1.2 (2021-11-01)
# Platform: x86_64-pc-linux-gnu (64-bit)
# Running under: Ubuntu 20.04.4 LTS

# Matrix products: default
# BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
# LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0

# locale:
#  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#  [3] LC_TIME=en_GB.UTF-8        LC_COLLATE=en_US.UTF-8    
#  [5] LC_MONETARY=en_GB.UTF-8    LC_MESSAGES=en_US.UTF-8   
#  [7] LC_PAPER=en_GB.UTF-8       LC_NAME=C                 
#  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
# [11] LC_MEASUREMENT=en_GB.UTF-8 LC_IDENTIFICATION=C       

# attached base packages:
# [1] stats     graphics  grDevices utils     datasets  methods   base     

# other attached packages:
# [1] units_0.8-0    magrittr_2.0.2

# loaded via a namespace (and not attached):
# [1] compiler_4.1.2 Rcpp_1.0.8    
install_unit("penningar")
set_units(192, "penningar")
# 192 [penningar]
install_unit("mark", "192 penningar")
set_units(192, "penningar") %>% set_units("mark")
# 1 [mark]
# try oe instead of ö
install_unit("oere", "24 penningar")
set_units(192, "penningar") %>% set_units("mark") %>% set_units("oere") # works
# 8 [oere]
try(install_unit("öre", "24 penningar"))
# Error in ud_map_symbols(symbol, ut_unit) : Unit already maps to "oere"
set_units(192, "penningar") %>% set_units("mark") %>% set_units("öre") # doesn't work
# 8 [oere]
set_units(192, "penningar") %>% set_units("mark") %>% set_units("oere") # now doesn't work either
# 8 [oere]
StaffanBetner commented 2 years ago
> sessionInfo()
R version 4.1.2 (2021-11-01)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19042)

Matrix products: default

locale:
[1] LC_COLLATE=Swedish_Sweden.1252  LC_CTYPE=Swedish_Sweden.1252    LC_MONETARY=Swedish_Sweden.1252
[4] LC_NUMERIC=C                    LC_TIME=Swedish_Sweden.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.1.2 cli_3.2.0      tools_4.1.2    parallel_4.1.2 rlang_1.0.2 
edzer commented 2 years ago

It would be interesting to see what you get with https://cran.r-project.org/bin/windows/base/rdevel.html , using UCRT, which is supposedly UTF8 (though I wouldn't expect it to be any better than what I see under ubuntu!).

Enchufa2 commented 2 years ago

In my system:

library(magrittr)
library(units)
#> udunits database from /usr/share/udunits/udunits2.xml
sessionInfo()
#> R version 4.0.5 (2021-03-31)
#> Platform: x86_64-redhat-linux-gnu (64-bit)
#> Running under: Fedora 34 (Thirty Four)
#> 
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib64/libflexiblas.so.3.0
#> 
#> locale:
#>  [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=es_ES.UTF-8    
#>  [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=es_ES.UTF-8   
#>  [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] units_0.7-2    magrittr_2.0.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.7        knitr_1.36        R.cache_0.15.0    rlang_0.4.12     
#>  [5] fastmap_1.1.0     fansi_0.5.0       stringr_1.4.0     styler_1.6.2     
#>  [9] highr_0.9         tools_4.0.5       xfun_0.28         R.oo_1.24.0      
#> [13] utf8_1.2.2        withr_2.4.2       htmltools_0.5.2   ellipsis_0.3.2   
#> [17] yaml_2.2.1        digest_0.6.28     tibble_3.1.6      lifecycle_1.0.1  
#> [21] crayon_1.4.2      purrr_0.3.4       R.utils_2.11.0    vctrs_0.3.8      
#> [25] fs_1.5.2          glue_1.5.1        evaluate_0.14     rmarkdown_2.11   
#> [29] reprex_2.0.1      stringi_1.7.6     compiler_4.0.5    pillar_1.6.4     
#> [33] backports_1.4.1   R.methodsS3_1.8.1 pkgconfig_2.0.3

install_unit("penningar")
set_units(192, "penningar")
#> 192 [penningar]
install_unit("mark", "192 penningar")
set_units(192, "penningar") %>% set_units("mark")
#> 1 [mark]
install_unit("oere", "24 penningar")
set_units(192, "penningar") %>% set_units("mark") %>% set_units("oere")
#> 8 [oere]
try(install_unit("öre", "24 penningar"))
set_units(192, "penningar") %>% set_units("mark") %>% set_units("öre")
#> 8 [öre]
set_units(192, "penningar") %>% set_units("mark") %>% set_units("oere")
#> 8 [oere]

The differences are certainly annoying. But if you want to use öre, then install just that, and it should work in a UTF-8 setting.

Enchufa2 commented 2 years ago

Sorry, that was the last version. Now:

library(magrittr)
library(units)
#> udunits database from /usr/share/udunits/udunits2.xml
sessionInfo()
#> R version 4.0.5 (2021-03-31)
#> Platform: x86_64-redhat-linux-gnu (64-bit)
#> Running under: Fedora 34 (Thirty Four)
#> 
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib64/libflexiblas.so.3.0
#> 
#> locale:
#>  [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=es_ES.UTF-8    
#>  [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=es_ES.UTF-8   
#>  [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] units_0.8-0    magrittr_2.0.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.7        knitr_1.36        R.cache_0.15.0    rlang_0.4.12     
#>  [5] fastmap_1.1.0     fansi_0.5.0       stringr_1.4.0     styler_1.6.2     
#>  [9] highr_0.9         tools_4.0.5       xfun_0.28         R.oo_1.24.0      
#> [13] utf8_1.2.2        withr_2.4.2       htmltools_0.5.2   ellipsis_0.3.2   
#> [17] yaml_2.2.1        digest_0.6.28     tibble_3.1.6      lifecycle_1.0.1  
#> [21] crayon_1.4.2      purrr_0.3.4       R.utils_2.11.0    vctrs_0.3.8      
#> [25] fs_1.5.2          glue_1.5.1        evaluate_0.14     rmarkdown_2.11   
#> [29] reprex_2.0.1      stringi_1.7.6     compiler_4.0.5    pillar_1.6.4     
#> [33] backports_1.4.1   R.methodsS3_1.8.1 pkgconfig_2.0.3

install_unit("penningar")
set_units(192, "penningar")
#> 192 [penningar]
install_unit("mark", "192 penningar")
set_units(192, "penningar") %>% set_units("mark")
#> 1 [mark]
install_unit("oere", "24 penningar")
set_units(192, "penningar") %>% set_units("mark") %>% set_units("oere")
#> 8 [oere]
try(install_unit("öre", "24 penningar"))
#> Error in ud_map_symbols(symbol, ut_unit) : Unit already maps to "oere"
set_units(192, "penningar") %>% set_units("mark") %>% set_units("öre")
#> 8 [oere]
set_units(192, "penningar") %>% set_units("mark") %>% set_units("oere")
#> 8 [oere]

??? Now THIS is related to #301 and #304. Very strange though.

Enchufa2 commented 2 years ago

Anyway, I think we can close this one. A unit like öre is not expected to work in non-UTF-8 contexts. UCRT builds of R will improve this situation for Windows users.

StaffanBetner commented 2 years ago

Will use the workaround "oere" (& "oertugar") in the meantime then.

Enchufa2 commented 2 years ago

Perfect, thanks for the report!

StaffanBetner commented 2 years ago

Can confirm that it works on R 4.2 on Windows 10 (official UCRT build).