r-lib / vctrs

Generic programming with typed R vectors
https://vctrs.r-lib.org
Other
290 stars 66 forks source link

vctrs::vec_group_loc returns unxected result for int64 vector #1944

Closed andreymalakhov closed 5 months ago

andreymalakhov commented 5 months ago

vctrs::vec_group_loc behaves strangely with int64 values, for example for below 10 different ints, it groups it in 3 categories, while if the same vector is transformed to character, it separates to 10 different groups. MRE:

valuesToCheck <- bit64::c.integer64(
  -3814725335433977,
  9220227625594262126,
  9221490650618703145,
  -1899967055774256,
  9220993509417281485,
  9219847319472675260,
  -2164576212153509,
  -2366839787877962,
  -5085047632936132950,
  -1164744773611849940
)
print(vctrs::vec_group_loc(valuesToCheck))
#                    key                    loc
# 1    -3814725335433977 1, 2, 3, 4, 5, 6, 7, 8
# 2 -5085047632936132608                      9
# 3 -1164744773611849984                     10

valuesToCheckChar <- sapply(valuesToCheck, bit64::as.character.integer64)
print(vctrs::vec_group_loc(valuesToCheckChar))

#                     key loc
# 1     -3814725335433977   1
# 2   9220227625594262528   2
# 3   9221490650618702848   3
# 4     -1899967055774256   4
# 5   9220993509417281536   5
# 6   9219847319472674816   6
# 7     -2164576212153509   7
# 8     -2366839787877962   8
# 9  -5085047632936132608   9
# 10 -1164744773611849984  10
DavisVaughan commented 5 months ago

What version of vctrs are you using? It works as expected for me with CRAN vctrs. Also note that the R parser can't parse double values as big as your input (look at the 2nd value here for example, it is different from the input value).

> valuesToCheck <- bit64::c.integer64(
+   -3814725335433977,
+   9220227625594262126,
+   9221490650618703145,
+   -1899967055774256,
+   9220993509417281485,
+   9219847319472675260,
+   -2164576212153509,
+   -2366839787877962,
+   -5085047632936132950,
+   -1164744773611849940
+ )
> valuesToCheck
integer64
 [1] -3814725335433977    9220227625594263552  9221490650618703872  -1899967055774256   
 [5] 9220993509417279488  9219847319472675840  -2164576212153509    -2366839787877962   
 [9] -5085047632936132608 -1164744773611849984
> vctrs::vec_group_loc(valuesToCheck)
                    key loc
1     -3814725335433977   1
2   9220227625594263552   2
3   9221490650618703872   3
4     -1899967055774256   4
5   9220993509417279488   5
6   9219847319472675840   6
7     -2164576212153509   7
8     -2366839787877962   8
9  -5085047632936132608   9
10 -1164744773611849984  10
andreymalakhov commented 5 months ago

Ah, thank you for pointing out to the version, indeed with the latest 0.6.5 from CRAN it works, as expected. Thank you for pointing it out.