Open okrebs opened 5 years ago
Hi,
For large arrays, did you try ObtainModelEstimates in the pacage? Those will rely on RSolnp and might be faster.
I do agree that the core of the algorithm can be parallelized. Let me know how you progress on that.
I am actually looking at doing that using GPUs.
Cheers!
Hi,
thanks for your answer and sorry for the late reply I had a lot to work on.
Regarding ObtainModelEstimate:
Regarding parallel/Rcpp implementation: I have replicated the Ipfp algorithm using Rcpp. It runs about 4 to 10 times faster without parallelization. This code can be found in my branch Ipfp-parallel. It passes all the checks in the attached R file (where I have build the branched version setting the name of the package to mipfpC in DESCRIPTION and adapting the dynlib call in NAMESPACE accordingly), i.e. results satisfy all.equal() compared to the original Ipfp function. I still want to run a few more test though.
This code should be relatively easy to parallelize now, as it is just a giant for-loop through the whole array. Unfortunately I know nothing so far about GPU programming. I was intending to just use openmp to run through the loop in parallel.
Regards!
test_and_benchmark.txt bm_large: bm_hdim: bm:
Please ignore the above test file which was faulty (even though the actual algorithm did pass the tests). I have now created a pull requested that fixes this issue alltogether.
I benchmarked this package to be faster than my 2D Rcpp IPF implementation for 10x10 to 100x100 matrices. However, when adding mipfp to a project, it ran much slower for some reason. So, here is my 2D Rcpp version in case it is faster for some users.
#include <Rcpp.h>
#include <boost/multi_array.hpp>
using namespace Rcpp;
// [[Rcpp::export]]
NumericMatrix ipf_cpp(NumericMatrix M, NumericVector m1, NumericVector m2, float tol=1e-16){
NumericVector r1, r2;
int i, k;
for (k = 0; k < 1000; k++){
r1 = m1 / rowSums(M);
for (i = 0; i < M.nrow(); i++ ){ M(i,_) = M(i,_) * r1(i); }
r2 = m2 / colSums(M);
for (i = 0; i < M.ncol(); i++ ){ M(_,i) = M(_,i) * r2(i); }
if (is_true(all( abs(r1-1) < tol & abs(r2-1) < tol))){ break; }
}
return M;
}
Hi,
I am regularly applying Ipfp to relatively large arrays (4 dimensions totaling 10th of millions of values) with several constraints. Currently the Ipfp function is very slow in handling such large problems (solved overnight on relatively recent i7).
A large speed gain could come from parallelization of the central algorithm (in each iteration all values of a single margin can be calculated in parallel). However, currently the by far largest time constraint is the "aperm" function called on the large array by sweep, when applying the scaling factor for a specific target margin. After calling aperm the scaling factor is applied in a vectorized manner and therefore very fast. Little additional gain comes from parallelizing the latter step and it seems unreasonable to attempt to parallelize the permutation step.
I therefore aim to rewrite part of the core algorithm of the Ipfp function to rely on Rcpp and using for loops iterating over the appropriate indices for each margin. This avoids the need for sweep with its call to aperm, allowing to easily parallelize the algorithm for larger problems (passing something like parallel = TRUE or n.cores = X as parameter)
Any thoughts on this or other potential speed-ups of the Ipfp function are very welcome.