mlr-org / mlr3

mlr3: Machine Learning in R - next generation
https://mlr3.mlr-org.com
GNU Lesser General Public License v3.0
940 stars 85 forks source link

scaling well in mlr 3 #59

Closed AtharKharal closed 5 years ago

AtharKharal commented 5 years ago

This would be nice if packages of pbdR are utilized in mlr for efficient scalability.

berndbischl commented 5 years ago

hi! would you maybe add a few lines about how you would envision that exactly? maybe with a concrete use case / example?

berndbischl commented 5 years ago

or just some extra context

AtharKharal commented 5 years ago

hi, pbdR has packages like MPI, ZeroMQ, ScaLAPACK, NetCDF4 and PAPI which are highly scalable. The parallel and distributed computing capability of mlr may be enhanced by these packages, specially ScaLAPACK. Somthing from Socket to MPI.

mllg commented 5 years ago

I don't think that these packages and mlr3 team up well.

For parallelization, we rely on the future package which supports (besides many others) MPI/socket clusters.

There is a dbplyr backend to connect to out-of-memory data, e.g. SQL databases, Spark or bigquery. See https://github.com/mlr-org/mlr3db.

Finally, we are not doing any matrix operations ourselves, so we have no use for ScaLAPACK. We call learning algorithms from third party packages, and these are usually linked against the system BLAS/LAPACK.

mllg commented 5 years ago

If there are any interesting learning algorithms provided by the packages, these can of course be connected as learners.