randy3k / radian

A 21 century R console
MIT License
1.96k stars 73 forks source link

Windows code page detection #475

Open liegepr opened 2 months ago

liegepr commented 2 months ago

Thanks for developing radian.

I am running R v. 4.3.0 on, Windows 11. When using R term as interactive terminal in vscode, I am getting:

Sys.getlocale() [1]"LC_COLLATE=French_France.utf8;LC_CTYPE=French_France.utf8;LC_MONETARY=French_France.utf8;LC_NUMERIC=C;LC_TIME=French_France.utf8" l10n_info()$system.codepage [1] 65001 l10n_info()$codepage
[1] 65001

Now when using radian:

sessionInfo()$locale "LC_COLLATE=French_France.1252;LC_CTYPE=French_France.1252;LC_MONETARY=French_France.1252;LC_NUMERIC=C;LC_TIME=French_France.1252" l10n_info()$system.codepage [1] 1252 l10n_info()$codepage [1] 1252

After tweaking my .Rprofile, I can force R to use UTF-8 with radian:

sessionInfo()$locale [1] "LC_COLLATE=fr_FR.UTF-8;LC_CTYPE=fr_FR.UTF-8;LC_MONETARY=fr-FR.UTF-8;LC_NUMERIC=C;LC_TIME=fr-FR.UTF-8"

However, the R code page now conflicts with the Windows code page:

l10n_info()$system.codepage [1] 1252 l10n_info()$codepage
[1] 65001

Starting from Windows 10 version 1803 and R v4.2, l10n_info()$system.codepage should report 65001.

The R-help page for ?Sys.setlocale says: _"From R 4.2, UCRT locale names should be used. The character set should match the system/ANSI codepage (l10n_info()$codepage be the same as l10ninfo()$system.codepage). Setting it to any other value results in a warning and may cause encoding problems. As from R 4.2 on recent Windows the system codepage is 65001 and one should always use locale names ending with ".UTF-8" (except for "C" and ""), otherwise Windows may add a different character set."

randy3k commented 2 months ago

It is unfortunately due to lack of naive UTF-8 support for python (radian requires python in case you didn't know). It seems that there is a way to change python manifest's activeCodePage to UTF-8 via mt.exe https://github.com/python/cpython/issues/86873#issuecomment-1093895849

liegepr commented 2 months ago

Thanks for your answer.