Open 29antonioac opened 8 years ago
I have the same problem. I don't understand this situation.
process or thread pool generation in python is extremely costly:
In [7]: %timeit joblib.Parallel(n_jobs=5, backend='threading')([])
10 loops, best of 3: 103 ms per loop
I'd assume multi-processing is equally bad or even worse.
The same issue happens when doing predictions using random forests with n_jobs>1 and you predict small batches. I suggest if you have batches smaller than 1000 or 10K rows, dont use n_jobs>1. In sklearn you can usually turn off parallel computing at predict time by setting n_jobs=1 for the fitted estimator.
What Peter said. Multiprocessing comes at a cost, and results with a tiny dataset like Iris will tell you nothing about working at scale. Note that parallelism is only used in Nearest Neighbors at predict time.
On 11 April 2016 at 04:14, Peter Prettenhofer notifications@github.com wrote:
process or thread pool generation in python is extremely costly:
In [7]: %timeit joblib.Parallel(n_jobs=5, backend='threading')([]) 10 loops, best of 3: 103 ms per loop
I'd assume multi-processing is equally bad or even worse.
The same issue happens when doing predictions using random forests with n_jobs>1 and you predict small batches. I suggest if you have batches smaller than 1000 or 10K rows, dont use n_jobs>1. In sklearn you can usually turn off parallel computing at predict time by setting n_jobs=1 for the fitted estimator.
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/scikit-learn/scikit-learn/issues/6645#issuecomment-208034698
i wonder what we should do: a) put a heuristic in when to use indeed a parallel backend at prediction time or b) emit a warning when predicting small batches with n_jobs > 1?
Ok, thanks.
I think best solution (and easier) is B @pprett. If you want, I could try to do it.
i wonder what we should do: a) put a heuristic in when to use indeed a parallel backend at prediction time or b) emit a warning when predicting small batches with n_jobs > 1?
Actually, I find users have issues related to data size -- too small or too large -- that the core devs think are obvious, quite frequently. Either pertaining to: issues of memory usage and time for fitting, model size on disk, potential benefits of parallelism, statistical reliability over small samples, etc. The memory/time requirements, benefits of parallelism and model size on disk could all be roughly estimated and we could provide methods on estimators to do so. But it sounds like a big engineering task, where we have not even tabulated theoretical complexities which seems an appropriate first step. Better might be to have a documentation page to refer to about sample size issues...?
I think we discussed adding computational and memory complexity to all the models. Which would be nice. Should we open an issue with a checklist consisting of all estimators? ....?
Some observations I've noted on this.
On my windows 7 + AMD system, n_jobs doesn't work for K Nearest Neightbours. I set it to 6 on my 8 core system, but I get 12% CPU usage, which is a single core. I'm seeing a few python processes (3) but only 1 is using any CPU.
https://stackoverflow.com/questions/46252285/cant-get-sklearn-working-on-multiple-cores-using-n-jobs
Also, there seems to be a huge downturn in performance from KNeighborsRegressor.fit that's way beyond the levels I'd expect from the algorithmic complexity once I go from a few hundred thousand rows to a few million, We're talking going from a few minutes to many hours. The process isn't using as much memory as I might expect given the large number of rows.
Some stats below for 5 dimensions and 1 neighbor.
Rows | elapsed time in seconds 100000 | 11.5 200000 | 32 400000 | 179 600000 | 508 1200000 | 1535 3100000 | 7200+
I know this is a fairly stagnant issue but just something I was thinking about when I was having trouble with n_jobs using OPTICS. Could it be that clustering algorithms, in particular, don't support a multiprocessing paradigm very well due to the nature of the algorithm? It would seem to me that a "divide and conquer" approach as is done in multiprocessing would severely slow down anything that has cases which are not independent.
Hi all! I've asked this on StackOverflow but I didn't get any answer :(. I can't understand why KNearestNeighbors is slower with n_jobs > 1. I've set the start method for multiprocessing to forkserver but it's near 100 times slower...
I run this sample code and I post results.
Output:
Could anyone explain me this behaviour? I've tried with so much complex datasets and the behaviour is the same :(.
Thank you!