sevamoo / SOMPY

A Python Library for Self Organizing Map (SOM)
Apache License 2.0
535 stars 242 forks source link

Issue with n_job = -1 #16

Open jiperez91 opened 8 years ago

jiperez91 commented 8 years ago

Hi! I'm a student of Computer Engineering and I'm trying to use SOMPY. I want to speed up the algorithm (reducing the total time elapsed) and I see that in the code the train function has a parameter (njob) that can help me. I have been studying this function and supposedly I have to put njob = -1 in order to use all the cores of my processor. The problem is when I call the train function with njob = -1, the algorithm freezes and it does nothing. The only way that works fine is with njob = 1, because if I put njob = 2 or njob = 6, it finishes later than with njob = 1, so I can't understand (with njob = 6 it supposed to run faster that with njob = 1).

I REALLY NEED a working paralelization of this algorithm in order to run faster for large maps (100x100 approximately), so if there is a solution to this problem I will be really grateful if you can help me.

Thank you for your time.

ivallesp commented 8 years ago

SOMPY uses "Parallel()" function from sklearn library to find the BMUs by chunks. As far as I remember, this function pickles the data, i.e. the chunks, for sharing it in jobs... If the chunks are too small, the pickle step may make the process slower than non parallelising. This is my hypothesis of what's failing here.

I would try to use the multiprocessing library to accomplish that. If I have time I will try it to see if I can make it work!

ivallesp commented 8 years ago

I just wrote a (hopefully =D) solution in PR #47