Open okrebs opened 4 years ago
Is this pull request under active consideration? Because it looks like it would be extremely useful.
I am trialling this branch now. It is indeed much faster. I noticed that code that worked under the current CRAN version breaks under this version with a couple of small things:
as.matrix(...)
around the data passed to the seed argument to make it consistent with the target.data
. Previously Ipfp was more relaxed about this being a data frame.These aren't big deals but do count as breaking changes that would need to be managed if this is merged in.
Regarding your first point what is actually expected is an array (as 'seed' can be multidimensional, matrix should work as it has subclass 'array'). There probably should be a warning if a "data.frame" or similar is passed. However, I can not replicate your problem with my branch as passing a data.frame works for me:
library(mipfp)
Lade nötiges Paket: cmm
Lade nötiges Paket: Rsolnp
Lade nötiges Paket: numDeriv
mymat <- matrix(1, nrow = 2, ncol = 2)
mydat <- data.frame(a = c(1,1), b = c(1,1))
Ipfp(seed = mymat, target.list = list(1), target.data = list(c(3,3)))
Call:
Ipfp(seed = mymat, target.list = list(1), target.data = list(c(3,
3)))
Method: ipfp - convergence: TRUE
Estimates:
[,1] [,2]
[1,] 1.5 1.5
[2,] 1.5 1.5
Ipfp(seed = mydat, target.list = list(1), target.data = list(c(3,3)))
Call:
Ipfp(seed = mydat, target.list = list(1), target.data = list(c(3,
3)))
Method: ipfp - convergence: TRUE
Estimates:
a b
1 1.5 1.5
2 1.5 1.5
Can you provide a MWE?
The same for your second point.
Hi Okrebs,
Happy to merge your work into this repo. Thanks for your good work! And sorry for the extremely long delay...
Before doing the merge,
Cheers!
Johan
Sorry for the long delay in responding, and not including a reprex in my first comments.
As it happens @okrebs your own example produces the error I was finding:
> mymat <- matrix(1, nrow = 2, ncol = 2)
> mydat <- data.frame(a = c(1,1), b = c(1,1))
> Ipfp(seed = mymat, target.list = list(1), target.data = list(c(3,3)))
Call:
Ipfp(seed = mymat, target.list = list(1), target.data = list(c(3,
3)))
Method: ipfp - convergence: TRUE
Estimates:
[,1] [,2]
[1,] 1.5 1.5
[2,] 1.5 1.5
> Ipfp(seed = mydat, target.list = list(1), target.data = list(c(3,3)))
Error in .IpfpCoreC(seed, target.list, target.data, print, iter, tol, :
Not compatible with requested type: [type=list; target=double].
> Ipfp(seed = as.matrix(mydat), target.list = list(1), target.data = list(c(3,3)))
Call:
Ipfp(seed = as.matrix(mydat), target.list = list(1), target.data = list(c(3,
3)))
Method: ipfp - convergence: TRUE
Estimates:
[,1] [,2]
[1,] 1.5 1.5
[2,] 1.5 1.5
There must be something else differing with our systems. The example above works fine with the CRAN version, but when I build from the fast-ipfp branch I get the above result - fails with a data.frame, but is ok with as.matrix()
around the data.frame.
Here is my session info:
> devtools::session_info()
- Session info ------------------------------------------------------------------------------------
setting value
version R version 4.1.1 (2021-08-10)
os Windows 10 x64
system x86_64, mingw32
ui RStudio
language (EN)
collate English_United States.1252
ctype English_United States.1252
tz Australia/Sydney
date 2021-10-12
- Packages ----------------------------------------------------------------------------------------
package * version date lib source
cachem 1.0.6 2021-08-19 [1] CRAN (R 4.1.1)
callr 3.7.0 2021-04-20 [1] CRAN (R 4.1.1)
cli 3.0.1 2021-07-17 [1] CRAN (R 4.1.1)
cmm * 0.12 2018-03-11 [1] CRAN (R 4.1.1)
crayon 1.4.1 2021-02-08 [1] CRAN (R 4.1.1)
desc 1.4.0 2021-09-28 [1] CRAN (R 4.1.1)
devtools 2.4.2 2021-06-07 [1] CRAN (R 4.1.1)
ellipsis 0.3.2 2021-04-29 [1] CRAN (R 4.1.1)
fastmap 1.1.0 2021-01-25 [1] CRAN (R 4.1.1)
fs 1.5.0 2020-07-31 [1] CRAN (R 4.1.1)
glue 1.4.2 2020-08-27 [1] CRAN (R 4.1.1)
lifecycle 1.0.1 2021-09-24 [1] CRAN (R 4.1.1)
magrittr 2.0.1 2020-11-17 [1] CRAN (R 4.1.1)
memoise 2.0.0 2021-01-26 [1] CRAN (R 4.1.1)
mipfp * 3.1 2021-10-11 [1] local
numDeriv * 2016.8-1.1 2019-06-06 [1] CRAN (R 4.1.1)
pkgbuild 1.2.0 2020-12-15 [1] CRAN (R 4.1.1)
pkgload 1.2.2 2021-09-11 [1] CRAN (R 4.1.1)
prettyunits 1.1.1 2020-01-24 [1] CRAN (R 4.1.1)
processx 3.5.2 2021-04-30 [1] CRAN (R 4.1.1)
ps 1.6.0 2021-02-28 [1] CRAN (R 4.1.1)
purrr 0.3.4 2020-04-17 [1] CRAN (R 4.1.1)
R6 2.5.1 2021-08-19 [1] CRAN (R 4.1.1)
Rcpp 1.0.7 2021-07-07 [1] CRAN (R 4.1.1)
remotes 2.4.1 2021-09-29 [1] CRAN (R 4.1.1)
rlang 0.4.11 2021-04-30 [1] CRAN (R 4.1.1)
rprojroot 2.0.2 2020-11-15 [1] CRAN (R 4.1.1)
Rsolnp * 1.16 2015-12-28 [1] CRAN (R 4.1.1)
sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 4.1.1)
testthat 3.1.0 2021-10-04 [1] CRAN (R 4.1.1)
truncnorm 1.0-8 2018-02-27 [1] CRAN (R 4.1.1)
usethis 2.0.1 2021-02-10 [1] CRAN (R 4.1.1)
withr 2.4.2 2021-04-18 [1] CRAN (R 4.1.1)
[1] D:/Peter/Documents/R/win-library/4.1
[2] C:/Program Files/R/R-4.1.1/library
I should add that the second issue I noted, where in the CRAN version I had to 'melt' the results to get the desired output but with your fast-ipfp version I didn't have to, I can not reproduce, and I think the problem was probably entirely at my end. For the record, the following is my test case and both the CRAN version and the fast-ipfp version work ok with it:
seed_df <- data.frame(
lions = 1:2,
tigers = 4:5,
bears = 7:8
)
row.names(seed_df) <- c("Africa", "India")
# marginal populations for Africa, India; lions, tigers, bears:
population_df <- list(c(100, 100), c(60, 110, 30))
fit <- Ipfp(
seed = as.matrix(seed_df),
target.list = list(1,2),
target.data = population_df
)
raked <- fit[[1]]
dim(raked) # should be 2, 3
Finally, here is another problem. When building the package, I see this note:
Note: break used in wrong context: no loop is visible at ipfp_multi_dim.R:107
I can then force an error when I have multi dimensional fitting with inconsistent marginal totals:
library(mipfp)
seed_df <- data.frame(
lions = 1:2,
tigers = 4:5,
bears = 7:8
)
row.names(seed_df) <- c("Africa", "India")
# marginal populations for Africa, India; lions, tigers, bears:
population_df <- list(c(100, 100), c(60, 120, 30)) # note inconsistent totals
fit <- Ipfp(
seed = as.matrix(seed_df),
target.list = list(1,2),
target.data = population_df
)
produces this error in the fast-ipfp branch:
Error in Ipfp(seed = as.matrix(seed_df), target.list = list(1, 2), target.data = population_df) :
no loop for break/next, jumping to top level
In addition: Warning message:
In Ipfp(seed = as.matrix(seed_df), target.list = list(1, 2), target.data = population_df) :
Target not consistent - shifting to probabilities!
Check input data!
But if I reload and use the CRAN version it works as expected (noting that it returns probabilities, not marginal totals, because they were inconsistent - but at least it isn't an error)
> library(mipfp)
Loading required package: cmm
Loading required package: Rsolnp
Loading required package: numDeriv
>
> seed_df <- data.frame(
+ lions = 1:2,
+ tigers = 4:5,
+ bears = 7:8
+ )
>
> row.names(seed_df) <- c("Africa", "India")
>
> # marginal populations for Africa, India; lions, tigers, bears:
> population_df <- list(c(100, 100), c(60, 120, 30)) # note inconsistent totals
>
>
>
> fit <- Ipfp(
+ seed = as.matrix(seed_df),
+ target.list = list(1,2),
+ target.data = population_df
+ )
Warning message:
In Ipfp(seed = as.matrix(seed_df), target.list = list(1, 2), target.data = population_df) :
Target not consistents - shifting to probabilities!
Check input data!
>
> fit[[1]]
lions tigers bears
Africa 0.1181559 0.3029328 0.07891131
India 0.1675584 0.2684957 0.06394583
Rcpp implementation including parallelization of the Ipfp algorithm. This greatly increases speed for large problems and fixes #4 and #5.
Apart from this the obtained (multidimensional) matrices are equal to the previous implementation up to machine (double) precision.