Program stops working after training step

recsyschallenge / 2017

40 stars 24 forks source link

Program stops working after training step #22

Closed manoj1992 closed 7 years ago

manoj1992 commented 7 years ago

Hi,

I am facing an issue when running the baseline on a server. It is able to print up to:

.... .... [23] train-rmse:0.348896 [24] train-rmse:0.348896

and then the program stays idle. I believe what's happening is that the processes that are spawned through the 'multiprocessing' library in Python crash immediately after being initiated. Does anybody know how to resolve this?

I do not have this problem when running it on my laptop.

Thanks.

ghost commented 7 years ago

+1 on OSX 10.12.1, ( XGboost built using gcc-6 from homebrew )

jbochi commented 7 years ago

Same happens to me on OS X. The process hangs after calling model.predict(

Try adding a print message right before this line and after it to confirm.

ghost commented 7 years ago

same here after ypred = model.predict(dtest) the process hangs!

jbochi commented 7 years ago

numpy does not work with multiprocessing when using Apple's accelerate framework: https://github.com/numpy/numpy/issues/5752#issuecomment-90229969

ghost commented 7 years ago

It works fine on the main process by calling classify_worker without using multiprocessing.Process on OSX. Did any body managed to get multiworkers working on OSX?

VasiliyRubtsov commented 7 years ago

model.set_param({'nthread':1})