Open kendonB opened 10 months ago
What version of R do you have? I cannot reproduce it on my computer
R version 4.3.1 (2023-06-16) -- "Beagle Scouts"
Platform: x86_64-apple-darwin20 (64-bit)
r$> digest::digest(mtcars, serialize = TRUE, serializeVersion = 3L)
[1] "051aee0c8529378c027b69f4bfcfa88a"
I have digest
version 0.6.31.
R version 4.3.1 (2023-06-16 ucrt) -- "Beagle Scouts"
Platform: x86_64-w64-mingw32 (64-bit)
r$> digest::digest(mtcars, serialize = TRUE, serializeVersion = 3L)
[1] "504a0ceaac24e5bd4f54c1b2ebd32e7a"
r$> packageVersion("digest")
[1] '0.6.33'
r$> Sys.info()
sysname release version machine
"Windows" "10 x64" "build 22621" "x86-64"
I can't reproduce on my linux or WSL systems
Would you try?
b <- serialize(mtcars, connection = NULL, version = 3L)
digest::digest(b, serialize = FALSE, skip = 14)
it should give the same results as
digest::digest(mtcars, serialize = TRUE, serializeVersion = 3L)
If they give different results, could you share the lengths of b
s and the first few bytes it?
Same results
r$> digest::digest(mtcars, serialize = TRUE, serializeVersion = 3L)
[1] "504a0ceaac24e5bd4f54c1b2ebd32e7a"
r$>
r$> b <- serialize(mtcars, connection = NULL, version = 3L)
digest::digest(b, serialize = FALSE, skip = 14)
[1] "504a0ceaac24e5bd4f54c1b2ebd32e7a"
Is this a clue? radian:
r$> length(serialize(mtcars, connection = NULL, version = 3L))
[1] 3808
Rterm:
> length(serialize(mtcars, connection = NULL, version = 3L))
[1] 3807
b <- serialize(mtcars, connection = NULL, version = 3L)
digest::digest(b, serialize = FALSE, skip = 14)
gives the same results in both Rterm and radian? It is a bit odd since the length of b
are different.
Could you also report the following on both Rterm and radian?
Sys.getenv("LANG")
Sys.getlocale()
l10n_info()
For some reason, I cannot reproduce it on Windows 11 running on a virtual machine. Did you try a newer version of radian?
gives the same results in both Rterm and radian? It is a bit odd since the length of b are different.
No, those differ. They're the same from bit 23 for Rterm / 24 for radian:
Rterm:
> skip_base <- 21
> digest::digest(serialize(mtcars, connection = NULL, version = 3L), serialize = FALSE, skip = skip_base)
[1] "c85aa57f14d5e067930bf841688b5477"
> skip_base <- 22
> digest::digest(serialize(mtcars, connection = NULL, version = 3L), serialize = FALSE, skip = skip_base)
[1] "d3bcef08916358ff8885d327b564425b"
> serialize(mtcars, connection = NULL, version = 3L)[1:30]
[1] 58 0a 00 00 00 03 00 04 03 01 00 03 05 00 00 00 00 05 55 54 46 2d 38 00 00 03 13 00 00 00
# radian
r$> skip_base <- 21
digest::digest(serialize(mtcars, connection = NULL, version = 3L), serialize = FALSE, skip = skip_base + 1)
[1] "1ff1d90cff7b8842217d4b0dd62d785a"
r$> skip_base <- 22
digest::digest(serialize(mtcars, connection = NULL, version = 3L), serialize = FALSE, skip = skip_base + 1)
[1] "d3a7b100f59f32e8719a9706ac8154f3"
r$> serialize(mtcars, connection = NULL, version = 3L)[1:30]
[1] 58 0a 00 00 00 03 00 04 03 01 00 03 05 00 00 00 00 06 43 50 31 32 35 32 00 00 03 13 00 00
Rterm:
> Sys.getenv("LANG")
[1] "en_US.UTF-8"
> Sys.getlocale()
[1] "LC_COLLATE=English_New Zealand.utf8;LC_CTYPE=English_New Zealand.utf8;LC_MONETARY=English_New Zealand.utf8;LC_NUMERIC=C;LC_TIME=English_New Zealand.utf8"
> l10n_info()
$MBCS
[1] TRUE
$`UTF-8`
[1] TRUE
$`Latin-1`
[1] FALSE
$codepage
[1] 65001
$system.codepage
[1] 65001
radian:
r$> Sys.getenv("LANG")
Sys.getlocale()
l10n_info()
[1] "en_US.UTF-8"
[1] "LC_COLLATE=English_New Zealand.1252;LC_CTYPE=English_New Zealand.1252;LC_MONETARY=English_New Zealand.1252;LC_NUMERIC=C;LC_TIME=English_New Zealand.1252"
$MBCS
[1] FALSE
$`UTF-8`
[1] FALSE
$`Latin-1`
[1] TRUE
$codepage
[1] 1252
$system.codepage
[1] 1252
I have:
PS C:\Users\KennyBell> radian --version
radian version: 0.6.5
r executable: C:\PROGRA~1\R\R-43~1.1\bin\R
r version: 4.3.1
python executable: C:\Users\KennyBell\anaconda3\python.exe
python version: 3.10.9
Still the same on the latest version:
PS C:\Users\KennyBell> radian --version
radian version: 0.6.6
r executable: C:\PROGRA~1\R\R-43~1.1\bin\R
r version: 4.3.1
python executable: C:\Users\KennyBell\anaconda3\python.exe
python version: 3.10.9
PS C:\Users\KennyBell> radian
R version 4.3.1 (2023-06-16 ucrt) -- "Beagle Scouts"
Platform: x86_64-w64-mingw32 (64-bit)
r$> digest::digest(mtcars, serialize = TRUE, serializeVersion = 3L)
[1] "504a0ceaac24e5bd4f54c1b2ebd32e7a"
I wonder if it's anaconda messing with something
I think I have figured it out. it is a locale thing. It is very tricky to get the get locale set right since python doesn't support native utf-8 codepage, see https://github.com/randy3k/radian/issues/269 We will need to "force" python to use utf-8 codepage.
r$> Sys.setlocale(locale = "English_New Zealand.1252")
[1] "LC_COLLATE=English_New Zealand.1252;LC_CTYPE=English_New Zealand.1252;LC_M"
r$> digest::digest(mtcars, serialize = TRUE, serializeVersion = 3L)
[1] "504a0ceaac24e5bd4f54c1b2ebd32e7a"
r$> Sys.setlocale(locale = "English_New Zealand.utf8")
[1] "LC_COLLATE=English_New Zealand.utf8;LC_CTYPE=English_New Zealand.utf8;LC_M"
Warning message:
In Sys.setlocale(locale = "English_New Zealand.utf8") :
using locale code page other than 1252 may cause problems
r$> digest::digest(mtcars, serialize = TRUE, serializeVersion = 3L)
[1] "051aee0c8529378c027b69f4bfcfa88a"
can reproduce using a standard python install on Windows 11:
(base) PS C:\Python311\Scripts> .\radian.exe
R version 4.3.1 (2023-06-16 ucrt) -- "Beagle Scouts"
Platform: x86_64-w64-mingw32 (64-bit)
r$> digest::digest(mtcars, serialize = TRUE, serializeVersion = 3L)
[1] "504a0ceaac24e5bd4f54c1b2ebd32e7a"
r$> exit()
(base) PS C:\Python311\Scripts> .\radian.exe --version
radian version: 0.6.6
r executable: C:\PROGRA~1\R\R-43~1.1\bin\R
r version: 4.3.1
python executable: C:\Python311\python.exe
python version: 3.11.4
I figured out why my radian doesn't produce the error in default. I was using "Git for bash". For some reason, it has "correctly" forced python to use the utf-8 codepage.
Git for bash (note that warning message)
$ radian
During startup - Warning message:
Using locale code page other than 1252 may cause problems.
R version 4.3.1 (2023-06-16 ucrt) -- "Beagle Scouts"
Platform: x86_64-w64-mingw32 (64-bit)
r$> l10n_info()$codepage
[1] 65001
r$> digest::digest(mtcars, serialize = TRUE, serializeVersion = 3L)
[1] "051aee0c8529378c027b69f4bfcfa88a"
Windows Terminal
PS C:\Users\Randy\Desktop> radian
R version 4.3.1 (2023-06-16 ucrt) -- "Beagle Scouts"
Platform: x86_64-w64-mingw32 (64-bit)
r$> l10n_info()$codepage
[1] 1252
r$> digest::digest(mtcars, serialize = TRUE, serializeVersion = 3L)
[1] "504a0ceaac24e5bd4f54c1b2ebd32e7a"
r$> Sys.setlocale(locale = "English_New Zealand.utf8")
[1] "LC_COLLATE=English_New Zealand.utf8;LC_CTYPE=English_New Zealand.utf8;LC_MONETARY=English_New Zealand.ut"
Warning message:
In Sys.setlocale(locale = "English_New Zealand.utf8") :
using locale code page other than 1252 may cause problems
r$> digest::digest(mtcars, serialize = TRUE, serializeVersion = 3L)
[1] "051aee0c8529378c027b69f4bfcfa88a"
Edit:
It seems that it is because Git for bash have set the environment variable LC_CTYPE = 'en_US.UTF-8'
.
I think a solution is to always force LC_CTYPE to en_US.UTF-8
, see 60ccb0a.
radian 0.6.7 is out.
Unfortunately, changing the locale breaks the plot()
function. We might have to revert the change here.
See here for background.
When running
serializeVersion = 3L
, radian gives a different result to regular R.