Closed ArcadeAntics closed 10 months ago
@ArcadeAntics how would this issue affect current plumber code? What would be the impact of not fixing it? I ask because I do not see any reference to the srcfile
attribute being used or private$parsed
being exposed.
The fix is simple enough. Just trying to gauge the ramification.
@meztez it does not affect the package:plumber
code itself, but it ruins my code. I want to know where each function is written, and the easiest way to do that is source references, but this bug messes that up. The impact of not fixing it is my code continues to not work and I have to implement some hackish solution to get it to behave the way I want. Fixing this will not negatively impact this package's code or anyone else's, only improve it.
@ArcadeAntics could you share a ruined piece of code? Maybe an example plumber file with the encoding that causes the issue?
Fixing this will not negatively impact this package's code or anyone else's, only improve it.
That may be true, or not, I'm not sure. What would be the impact of not executing this code block.
if (keep.source) {
text <- readLines(file, warn = FALSE, encoding = encoding)
if (!length(text))
text <- ""
close(file)
file <- stdin()
srcfile <- srcfilecopy(filename, text, file.mtime(filename),
isFile = TRUE)
}
The input to .Internal(parse
would not be the same.
Seems you are right, no meaningful impact.
aa <- function (file)
{
lines <- plumber:::readUTF8(file)
enc <- if (any(Encoding(lines) == "UTF-8"))
"UTF-8"
else "unknown"
src <- srcfilecopy(file, lines, isFile = TRUE)
exprs <- try(parse(file, keep.source = TRUE, srcfile = src,
encoding = enc))
if (inherits(exprs, "try-error")) {
stop("Error sourcing ", file)
}
exprs
}
bb <- function (file)
{
lines <- plumber:::readUTF8(file)
enc <- if (any(Encoding(lines) == "UTF-8"))
"UTF-8"
else "unknown"
src <- srcfilecopy(file, lines, isFile = TRUE)
file <- tempfile()
on.exit(unlink(file), add = TRUE)
writeLines(lines, file)
exprs <- try(parse(file, keep.source = TRUE, srcfile = src,
encoding = enc))
if (inherits(exprs, "try-error")) {
stop("Error sourcing ", file)
}
exprs
}
cc <- function (file)
{
lines <- plumber:::readUTF8(file)
enc <- if (any(Encoding(lines) == "UTF-8"))
"UTF-8"
else "unknown"
src <- srcfilecopy(file, lines, isFile = TRUE)
file <- tempfile()
on.exit(unlink(file), add = TRUE)
writeLines(lines, file)
exprs <- try(parse(file, keep.source = FALSE, srcfile = src,
encoding = enc))
if (inherits(exprs, "try-error")) {
stop("Error sourcing ", file)
}
exprs
}
aa1 <- aa("R/sample.R")
bb1 <- bb("R/sample.R")
cc1 <- cc("R/sample.R")
attributes(aa1)
attributes(bb1)
attributes(cc1)
> attributes(aa1)
$srcref
$srcref[[1]]
aaa <- "eric la chaperone"
$srcfile
R/sample.R
$wholeSrcref
aaa <- "eric la chaperone"
> attributes(bb1)
$srcref
$srcref[[1]]
aaa <- "eric la chaperone"
$srcfile
/tmp/RtmpBvrTql/file260ca737a299a
$wholeSrcref
aaa <- "eric la chaperone"
> attributes(cc1)
$srcref
$srcref[[1]]
aaa <- "eric la chaperone"
$srcfile
R/sample.R
$wholeSrcref
aaa <- "eric la chaperone"
from an ISO-8852-1 sample.R file
aaa <- "eric la chaperone"
@meztez even an ASCII encoded file will cause this issue:
function ()
NULL
I put this in a file ~/test-plumber.R
and ran attr(plumber:::parseUTF8("~/test-plumber.R"), "srcfile")
and it gave me C:\Users\iris\AppData\Local\Temp\Rtmpshydhs\file1a7c39064634
. This also fails for scripts written in latin1 which used to be the native encoding before R 4.2.0
. In my specific example, I had something like:
function ()
utils::getSrcFilename(sys.function(), full.names = TRUE)
which of course fails because the source reference of the function points to the wrong file.
Yes, the input to .Internal(parse())
is different, but it returns the same result. If you need convincing of that, look at the code of do_parse
and R_ParseConn
and R_ParseVector
and do_readLines
. You can see that they both do the same thing. And you might say "well look, readLines does character translation" but in this case, since encoding
is unknown, no translation is done. But then you might say "look at do_parse, it sets known_to_be_latin1 and known_to_be_utf8 for character inputs" but again, because the encodings of the strings are not marked, it does not set either of these. The output from .Internal(parse())
will be completely identical.
@ArcadeAntics I came to the same conclusion. See #930.
Example application or steps to reproduce the problem
Describe the problem in detail
In
parseUTF8()
the comments claim that on Windows with encoding"unknown"
the file must be re-encoded to native. I would agree with this, though I would think this is no longer necessary since the C runtime on Windows is now ucrt and the native encoding is now UTF-8. Regardless, it writes the lines into a new file in the native encoding.It then claims that despite parsing a different file, the source reference is pointed to the original file. This is incorrect since
parse()
overwrites its argumentsrcfile
whenfile
is a character string andkeep.source = TRUE
.To avoid
srcfile
being overwritten when passed toparse()
, you should useparse(keep.source = FALSE)
(unintuitively).Here is some code showing the fix working as intended: