scikit-learn-contrib / py-earth

A Python implementation of Jerome Friedman's Multivariate Adaptive Regression Splines
http://contrib.scikit-learn.org/py-earth/
BSD 3-Clause "New" or "Revised" License
455 stars 121 forks source link

Does the algorithm use multiple threads in fitting #155

Open Fish-Soup opened 7 years ago

Fish-Soup commented 7 years ago

Hi

I've come to a point where i want to fit multiple MARS models independently at the same time. However I am finding that when I try to fit multiple models with Multi-processing module I am not getting much of a performance increase. i am struggling to get 2x speeds with an i7 (4 cores 8 threads).

I wonder if this is due to MARS already fitting the models using multiple threads and such i cant get much performance boost?

jcrudy commented 7 years ago

@Fish-Soup While py-earth itself doesn't do any parallelization, it makes a lot of calls to BLAS and Lapack. Depending on what implementations of those libraries you have installed, your system may be using multiple CPUs simultaneously for a single fit. If you want to fit multiple models simultaneously, there may be benefits to limiting BLAS to a single thread (for each process). However, I haven't experimented with doing this at all. I'm not even sure exactly how you would, although I believe it's possible to do. It may be that doing so actually ends up slowing things down, however. It's something you'd have to test out to be sure. Another way to get speed might be just to get a faster BLAS, depending on your system.

Also, setting use_fast=True will use the "fast MARS" algorithm, which might perform slightly worse in terms of accuracy but is significantly faster. You can fine-tune it with the fast_k and fast_h parameters.

Fish-Soup commented 7 years ago

Hi

After posting I came to the same conclusion. I'm now using the use_fast flag, at least while I am prototyping, which is as you say significantly faster. thanks