oracle / fastr

A high-performance implementation of the R programming language, built on GraalVM.
Other
627 stars 64 forks source link

sprintf triggers FormatFlagsConversionMismatchException #191

Open rmartinsanta opened 2 years ago

rmartinsanta commented 2 years ago

While debugging a possible bug in one R library, found a potential issue in the sprintf function. The formatter seems to not be a valid in Java, while being valid in R.

Minimum code to reproduce that works on standard R:

sprintf("%#15.10g", 2870.0)

Result in GraalVM:

❯ ./R --R.PrintErrorStacktracesToFile=true
R version 4.0.3 (FastR)
Copyright (c) 2013-21, Oracle and/or its affiliates
Copyright (C) 2020 The R Foundation for Statistical Computing
Copyright (c) 2012-4 Purdue University
All rights reserved.

FastR is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

R is a collaborative project with many contributors.
Type 'contributors()' for more information.

Type 'q()' to quit R.
> sprintf("%#15.10g", 2870.0)
An internal error occurred: "java.util.FormatFlagsConversionMismatchException: Conversion = g, Flags = #"
Please report an issue at https://github.com/oracle/fastr including the commands and the error log file '/Users/rmartin/.sdkman/candidates/java/21.3.0.r17-grl/bin/fastr_errors_pid90947.log'.
> sessionInfo()
FastR version 4.0.3 (2020-10-10)
Platform: x86_64-apple-darwin20.6.0 (64-bit)
Running under: macOS  11.6

Matrix products: default
BLAS:   /Users/rmartin/.sdkman/candidates/java/21.3.0.r17-grl/languages/R/lib/libRblas.dylib
LAPACK: /Users/rmartin/.sdkman/candidates/java/21.3.0.r17-grl/languages/R/lib/libRlapack.dylib

locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

Result in R:

❯ R

R version 4.1.1 (2021-08-10) -- "Kick Things"
Copyright (C) 2021 The R Foundation for Statistical Computing
Platform: x86_64-apple-darwin20.6.0 (64-bit)

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.

Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

> sprintf("%#15.10g", 2870.0)
[1] "    2870.000000"

Include the folloing info

Frame(d=1): sprintf (called as: sprintf("%#15.10g", 2870)) Frame(d=0): (called as: )

with frame slot contents: Frame(d=1): sprintf (called as: sprintf("%#15.10g", 2870)) fmt = [4, org.graalvm.compiler.truffle.runtime.FrameWithoutBoxing@69f461d3, expr=ConstantObjectNode@1bdfc2d2, %#15.10g] ... = RArgsValuesAndNames: null = [4, org.graalvm.compiler.truffle.runtime.FrameWithoutBoxing@69f461d3, expr=ConstantDoubleScalarNode@6ac95e2c, 2870.0] Visibility = true Frame(d=0): (called as: ) FunctionEvalCallNode-argsIdentifier = null FunctionEvalCallNode-funIdentifier = null Visibility = true .Random.seed = active binding RExplicitCall-argsIdentifier = null ``` * Use `$GRAALVM_HOME/bin/R --vm.version` and include the full output: ``` ❯ ./R --vm.version openjdk version "17.0.1" 2021-10-19 OpenJDK Runtime Environment GraalVM CE 21.3.0 (build 17.0.1+12-jvmci-21.3-b05) OpenJDK 64-Bit Server VM GraalVM CE 21.3.0 (build 17.0.1+12-jvmci-21.3-b05, mixed mode, sharing) ``` * Output of R built-in function `sessionInfo()`: Provided in the command output * OS name and version: MacOS Big Sur
rmartinsanta commented 2 years ago

The bug seems to a discrepancy of the meaning of the '#' symbol in Java and C, relevant parts in bold.

Java

'g' '\u0067' Requires the output to be formatted in general scientific notation as described below. The localization algorithm is applied. After rounding for the precision, the formatting of the resulting magnitude m depends on its value. If m is greater than or equal to 10-4 but less than 10precision then it is represented in decimal format. If m is less than 10-4 or greater than or equal to 10precision, then it is represented in computerized scientific notation. The total number of significant digits in m is equal to the precision. If the precision is not specified, then the default value is 6. If the precision is 0, then it is taken to be 1. If the '#' flag is given then an FormatFlagsConversionMismatchException will be thrown.

Source: https://docs.oracle.com/javase/7/docs/api/java/util/Formatter.html

C

The character % is followed by zero or more of the following flags: The value should be converted to an "alternate form". For o conversions, the first character of the output string is made zero (by prefixing a 0 if it was not zero already). For x and X conversions, a nonzero result has the string "0x" (or "0X" for X conversions) prepended to it. For a, A, e, E, f, F, g, and G conversions, the result will always contain a decimal point, even if no digits follow it (normally, a decimal point appears in the results of those conversions only if a digit follows). For g and G conversions, trailing zeros are not removed from the result as they would otherwise be. For other conversions, the result is undefined.

Source: https://linux.die.net/man/3/sprintf

Akirathan commented 2 years ago

Hi @rmartinsanta, thank you for the precise description. We will look into this issue.

Unfortunately, these kinds of incompatibilities with standard R (GNU-R) are inevitable. In this case, it is just the difference between the standard formatter in Java and in C, but in some cases, GNU-R has corner cases that does not comply to any standard, e.g., GNU-R has its own regular expression engine that handles some cases in a way different from all other standard engines (see this PR https://github.com/r-lib/testthat/pull/1377)