Closed wch closed 4 years ago
It took me a while to figure out how to reproduce the issue. Here's a build using Docker that results in the errors:
docker run --security-opt seccomp=unconfined --rm -ti rocker/r-devel /bin/bash
apt install git libssl-dev vim nano
echo en_US ISO-8859-1 >> /etc/locale.gen
locale-gen
echo MAKEFLAGS=-j4 > ~/.Renviron
git clone https://github.com/rstudio/httpuv.git
RD -e "install.packages('remotes')"
RD -e 'remotes::install_local("httpuv", dependencies = TRUE)'
export LANG=en_US.iso88591
# This prints a warning, but it seems to be necessary for things to work.
export LC_ALL=en_US.iso88591
RD --quiet -e 'Sys.getlocale()'
RD CMD build httpuv
RD CMD check httpuv_1.5.2.9000.tar.gz
I've narrowed this down to https://github.com/rstudio/httpuv/blob/622c76a749efbbb13903cd488cd1b8c54a48793c/src/fs.cpp#L56
(And probably the same issue would happen here: https://github.com/rstudio/httpuv/blob/622c76a749efbbb13903cd488cd1b8c54a48793c/src/filedatasource-unix.cpp#L14)
The stat()
call is passed filename.c_str()
. I believe this is a UTF-8 encoded string, but since the process is running with non-UTF-8 locale, it can't find the file.
In gdb:
RD -d gdb
b fs.cpp:56
r
library(devtools)
library(testthat)
load_all()
# test code from: https://github.com/rstudio/httpuv/blob/622c76a749efbbb13903cd488cd1b8c54a48793c/tests/testthat/test-static-paths.R#L768-L795
nonascii_path <- test_path("apps/f\U00FC")
dir.create(nonascii_path)
on.exit(unlink(nonascii_path, recursive = TRUE))
index_file_path <- file.path(nonascii_path, "index.html")
writeLines("Hello world!", index_file_path)
file_content <- raw_file_content(index_file_path)
s <- startServer("0.0.0.0", randomPort(),
list(
call = function(req) {
list(
status = 200L,
headers = list('Content-Type' = 'text/html'),
body = "R code path"
)
},
staticPaths = list(
"/f\U00FC" = nonascii_path,
"/foo" = nonascii_path
)
)
)
on.exit(s$stop(), add = TRUE)
# URL-encoded non-ASCII URL path, which maps to non-ASCII local path.
r <- fetch(local_url("/f%C3%BC", s$getPort()))
# ======= In GDB =======
# It can't lstat() the filename. (We use lstat() instead of stat() because gdb
# doesn't like that the stat function has the same name as the stat struct type.)
p (int)lstat(filename.c_str(), &sb)
#> $74 = -1
# With the explicit filename, copied and pasted
p filename.c_str()
#> $75 = 0x7fffe8013620 "/httpuv/tests/testthat/apps/fü"
p (int)lstat("/httpuv/tests/testthat/apps/fü" , &sb)
#> $76 = -1
# With the explicit filename, with native encoding (I think)
p (int)lstat("/httpuv/tests/testthat/apps/f\xfc", &sb)
#> $78 = 0
# Show that these strings are not identical - strlen is different:
p (int)strlen("/httpuv/tests/testthat/apps/fü")
#> $79 = 31
p (int)strlen("/httpuv/tests/testthat/apps/f\xfc")
#> $81 = 30
# \U00FC also returns the shorter byte sequence
p (int)strlen("/httpuv/tests/testthat/apps/f\U00FC")
#> $82 = 30
# Show the contents of the different encodings
# The filename.c_str() value is the UTF-8 encoding of ü, which is 195 188.
p (unsigned char) "ü"[0]
#> $115 = 195 '?'
p (unsigned char) "ü"[1]
#> $116 = 188 '?'
p (unsigned char) "ü"[2]
#> $117 = 0 '\000'
# Using "\xfc" is the ISO 8859-1 encoding.
p (unsigned char) "\xfc"[0]
#> $91 = 252 '?'
p (unsigned char) "\xfc"[1]
#> $92 = 0 '\000'
# Using "\U00FC" is also the ISO 8859-1 encoding.
p (unsigned char) "\U00FC"[0]
#> $93 = 252 '?'
p (unsigned char) "\U00FC"[1]
#> $94 = 0 '\000'
So I think the real solution would be to convert the string to the native encoding before looking for the file. However, I don't think that it's really worth doing at this time, for a couple of reasons:
In summary, this is an edge case where the cost and risk of fixing it isn't really worth it. I think we should just disable the test on Unix systems where there's a non-UTF-8 locale.
An email from CRAN:
Check log
``` * using log directory '/home/hornik/tmp/R.check/r-devel-clang/Work/PKGS/httpuv.Rcheck' * using R Under development (unstable) (2020-05-01 r78341) * using platform: x86_64-pc-linux-gnu (64-bit) * using session charset: ISO8859-15 * checking for file 'httpuv/DESCRIPTION' ... OK * checking extension type ... Package * this is package 'httpuv' version '1.5.2' * package encoding: UTF-8 * checking package namespace information ... OK * checking package dependencies ... OK * checking if this is a source package ... OK * checking if there is a namespace ... OK * checking for executable files ... OK * checking for hidden files and directories ... OK * checking for portable file names ... OK * checking for sufficient/correct file permissions ... OK * checking serialization versions ... OK * checking whether package 'httpuv' can be installed ... OK * checking package directory ... OK * checking for future file timestamps ... OK * checking DESCRIPTION meta-information ... OK * checking top-level files ... OK * checking for left-over files ... OK * checking index information ... OK * checking package subdirectories ... OK * checking R files for non-ASCII characters ... OK * checking R files for syntax errors ... OK * checking whether the package can be loaded ... OK * checking whether the package can be loaded with stated dependencies ... OK * checking whether the package can be unloaded cleanly ... OK * checking whether the namespace can be loaded with stated dependencies ... OK * checking whether the namespace can be unloaded cleanly ... OK * checking loading without being on the library search path ... OK * checking use of S3 registration ... OK * checking dependencies in R code ... OK * checking S3 generic/method consistency ... OK * checking replacement functions ... OK * checking foreign function calls ... OK * checking R code for possible problems ... [5s/6s] OK * checking Rd files ... OK * checking Rd metadata ... OK * checking Rd line widths ... OK * checking Rd cross-references ... OK * checking for missing documentation entries ... OK * checking for code/documentation mismatches ... OK * checking Rd \usage sections ... OK * checking Rd contents ... OK * checking for unstated dependencies in examples ... OK * checking line endings in C/C++/Fortran sources/headers ... OK * checking line endings in Makefiles ... OK * checking compilation flags in Makevars ... OK * checking for GNU extensions in Makefiles ... NOTE GNU make is a SystemRequirements. * checking for portable use of $(BLAS_LIBS) and $(LAPACK_LIBS) ... OK * checking use of PKG_*FLAGS in Makefiles ... OK * checking use of SHLIB_OPENMP_*FLAGS in Makefiles ... OK * checking include directives in Makefiles ... OK * checking pragmas in C/C++ headers and code ... OK * checking compilation flags used ... OK * checking compiled code ... OK * checking examples ... [1s/1s] OK * checking for unstated dependencies in 'tests' ... OK * checking tests ... [6s/8s] ERROR Running 'testthat.R' [6s/7s] Running the tests in 'tests/testthat.R' failed. Complete output: > library(testthat) > library(httpuv) > > test_check("httpuv") -- 1. Failure: Paths with non-ASCII characters (@test-static-paths.R#796) ----- r$status_code not identical to 200L. 1/1 mismatches [1] 404 - 200 == 204 -- 2. Failure: Paths with non-ASCII characters (@test-static-paths.R#797) ----- r$content not identical to `file_content`. Lengths (14, 13) differ (comparison on first 13 components) 13 element mismatches -- 3. Failure: Paths with non-ASCII characters (@test-static-paths.R#801) ----- r$status_code not identical to 200L. 1/1 mismatches [1] 404 - 200 == 204 -- 4. Failure: Paths with non-ASCII characters (@test-static-paths.R#802) ----- r$content not identical to `file_content`. Lengths (14, 13) differ (comparison on first 13 components) 13 element mismatches == testthat results =========================================================== [ OK: 204 | SKIPPED: 8 | WARNINGS: 0 | FAILED: 4 ] 1. Failure: Paths with non-ASCII characters (@test-static-paths.R#796) 2. Failure: Paths with non-ASCII characters (@test-static-paths.R#797) 3. Failure: Paths with non-ASCII characters (@test-static-paths.R#801) 4. Failure: Paths with non-ASCII characters (@test-static-paths.R#802) Error: testthat unit tests failed Execution halted * checking PDF version of manual ... OK * checking for non-standard things in the check directory ... OK * DONE Status: 1 ERROR, 1 NOTE ```