hsloot / rmo

An R package for Marshall--Olkin distributions and copulas.
https://hsloot.github.io/rmo/
GNU General Public License v3.0
5 stars 2 forks source link

[BUG] Wrong results for `is_within` with high values #36

Closed hsloot closed 4 years ago

hsloot commented 4 years ago

Description

The function is_within(i, j) produces wrong results for very high values of j>2^31-1, see also #35.

The reason for this is undocumented, implementation-dependent, or maybe just unknown behaviour (to me) how conversion is implemented in Rcpp and c++. R prevents integer overflow, by conversing an integer higher than 2^31-1 to a double. However, if a (theoretical) integer between 2^31-1 (32bit) and 2^63-1 (64bit) is passed from R to an Rcpp-function taking a long int there might be some loss due to previous internal conversions to a double. See also https://stackoverflow.com/a/1848762 for a discussion on the largest representable integer in a 64bit double.

reprex

library(rmo)
rmo:::is_within(1L, 2L^55L+1L) ## FALSE, should be TRUE
#> [1] FALSE

Created on 2020-02-16 by the reprex package (v0.3.0)

Session info ``` r devtools::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 3.6.2 (2019-12-12) #> os macOS Catalina 10.15.3 #> system x86_64, darwin19.2.0 #> ui unknown #> language (EN) #> collate de_DE.UTF-8 #> ctype de_DE.UTF-8 #> tz Europe/Berlin #> date 2020-02-16 #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0) #> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1) #> callr 3.4.1 2020-01-24 [1] CRAN (R 3.6.2) #> cli 2.0.1 2020-01-08 [1] CRAN (R 3.6.1) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0) #> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0) #> devtools 2.2.1 2019-09-24 [1] CRAN (R 3.6.1) #> digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.1) #> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1) #> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0) #> fansi 0.4.1 2020-01-08 [1] CRAN (R 3.6.1) #> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.0) #> glue 1.3.1 2019-03-12 [1] CRAN (R 3.6.0) #> highr 0.8 2019-03-20 [1] CRAN (R 3.6.0) #> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.1) #> knitr 1.28 2020-02-06 [1] CRAN (R 3.6.2) #> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0) #> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.1) #> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0) #> prettyunits 1.1.1 2020-01-24 [1] CRAN (R 3.6.2) #> processx 3.4.2 2020-02-09 [1] CRAN (R 3.6.2) #> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0) #> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.1) #> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.1) #> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.0) #> rlang 0.4.4 2020-01-28 [1] CRAN (R 3.6.2) #> rmarkdown 2.1 2020-01-20 [1] CRAN (R 3.6.2) #> rmo * 0.2.0.0000 2020-02-10 [1] local #> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0) #> stringi 1.4.5 2020-01-11 [1] CRAN (R 3.6.1) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0) #> testthat 2.3.1 2019-12-01 [1] CRAN (R 3.6.1) #> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.0) #> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0) #> xfun 0.12 2020-01-13 [1] CRAN (R 3.6.1) #> yaml 2.2.1 2020-02-01 [1] CRAN (R 3.6.2) #> #> [1] /usr/local/lib/R/3.6/site-library #> [2] /usr/local/Cellar/r/3.6.2/lib/R/library ```
hsloot commented 4 years ago

The function is_within is only used in Rcpp__rmo_esm (simulating from an exogenous shock model), Rcpp__rmo_arnold (simulating from the Arnold model), and is_mo_parameter (for calculating marginal rates). Hence, I see no problem with generally preventing these functions to be used in a dimension d > 31, since these function are not really feasible to use.

Moreover, R allows to have vectors with length greater than 2^31-1 since version 3.0.0, but uses a double-index which might lead to more problems and unexpected behaviour.

hsloot commented 4 years ago

Idea: Write an assertion is_dimension(x) which test if x is an admissible dimension parameter, which means is.count(x) && x>1 && d<32 == TRUE.

There should also be a test which makes sure that an iteration through all sets works properly for d = 31.