yihui / xfun

Miscellaneous R functions
https://yihui.org/xfun/
Other
135 stars 28 forks source link

is_ascii() is wrong on musl-based systems #81

Closed bastistician closed 1 year ago

bastistician commented 1 year ago

On my Alpine Linux check system, I see

* checking tests ...
  Running ‘test-ci.R’ [0s/0s]
  Running ‘test-cran.R’ [1s/1s]
 [2s/2s] ERROR
Running the tests in 'tests/test-cran.R' failed.
Complete output:
  > testit::test_pkg('xfun', 'test-cran')

  Attaching package: 'xfun'

  The following objects are masked from 'package:base':

      attr, isFALSE

  assertion failed: is_ascii() can identify ascii strings
  Error from assert("is_ascii() can identify ascii strings", { ...  at test-encoding.R#18 
  Error: (!is_ascii(latin_str)) is not TRUE
  Execution halted

A minimal reproducible example for the failing test follows:

latin_str = 'fa\u00e7ile'
iconv(latin_str, to = "ascii")
#> [1] "fa*ile"

library(xfun)
(!is_ascii(latin_str))
#> [1] FALSE

So on musl the iconv call does not give NA as expected by is_ascii. This behaviour is mentioned in help(iconv):

Some Linux distributions use 'musl' as their C runtime. This is less comprehensive than 'glibc': it does not support '//TRANSLIT' but does inexact conversions (currently using '*'). [...] Note that you cannot rely on invalid inputs being detected, especially for 'to = "ASCII"' where some implementations allow 8-bit characters and pass them through unchanged or with transliteration or substitution.

This means that testing for an NA result of this conversion is generally insufficient. is_ascii should probably also test whether the converted string differs from the input.

yihui commented 1 year ago

is_ascii should probably also test whether the converted string differs from the input.

Good idea. Done. Thank you!

bastistician commented 1 year ago

Thanks, I can confirm that the package now passes the tests on my "--extra-arch" check system.