kaz-Anova / StackNet

StackNet is a computational, scalable and analytical Meta modelling framework
MIT License
1.32k stars 344 forks source link

Sudden drop in computing performance? #43

Open arisbw opened 6 years ago

arisbw commented 6 years ago

Hi, I was running the code from my computer (Intel(R) Core(TM) i5-6200U CPU @ 2.30GHz and 12 GB RAM). I tried to train with 30k rows and 318 variables. But, it seems strange that there was sudden drop in the computing task that I didn't make any progress in fitting model/continue to next fold (I use 5 fold cross validation, 8 models in first layer and 1 model in second layer. And, I also use all of my 4 threads). Basically, with this data, I had sudden drop after 30 minutes running StackNet. Here is my screenshot from my last work.

2017-09-28

I also tried to play with threads command, but nothing changed.

kaz-Anova commented 6 years ago

I have seen that before in windows...Strangely if you just press enter into the screen it continues. Not sure why this happens.... Can you try that and let me know?

arisbw commented 6 years ago

Already tried it but nothing happened.

arisbw commented 6 years ago

Strangely enough, this event also occurred when I ran the code from ubuntu 16.04.

kaz-Anova commented 6 years ago

That is indeed strange. I have not seen that before... Does it always hang at the same place ? could you send me the file you run this and the parameters' file (as well as the command you run) and tell me where I should expect the pause in order to try and replicate?

goldentom42 commented 6 years ago

I'm also on Ubuntu 16.04 LTS I can give it a try if you wish.

arisbw commented 6 years ago

Thanks guys. Here are the files. You will expect sudden drop in 5th fold of first layer (4th model).

goldentom42 commented 6 years ago

ok got the files. just started stacknet on win8.1 and will let you know.

arisbw commented 6 years ago

If you happen to produce output file, could you please send it back to me? Thanks.

goldentom42 commented 6 years ago

Yeh sure ;-) By the way looking at the param file I saw that RandomForest had 5 threads when you have 4 logical cores. Did you try reducing that number? I'm just wondering if StackNet is not waiting for that thread to complete. This may sound completely stupid but never know...

arisbw commented 6 years ago

Ah I see. You seem right, but it should break at first fold, right? But now I try to run it again with modified threads params.

goldentom42 commented 6 years ago

Issue reproduced. perf drops at 30% on the 5th fold and 4 first models. 3 metrics are displayed but not the 4th one.

goldentom42 commented 6 years ago

Reducing the number of threads does not change the problem. However after reducing the number of estimators or iterations of the models I managed to get StackNet go through the all process...

arisbw commented 6 years ago

OK. Could you please share that modified params?

kaz-Anova commented 6 years ago

I am also running it right now and will let you know the results . I have many cores available and did not encounter a problem at 2nd fold (e.g. I am in 4th now) , which makes me think this is in general related with threading...

kaz-Anova commented 6 years ago

It is reproduced. Don't know why , but it seems to be at the predict() of the Softmaxclassifier . The reason CPU stops is not relevant - it has to do with the fact that we are in scoring and threading is not used. For some reason there must be a bug the code causing an infinite loop somewhere.

It does not throw error though..

kaz-Anova commented 6 years ago

I will try to find a workaround.

goldentom42 commented 6 years ago

@arisbw, the reduced estimators were really low like 3 or 4... so it won't help you. As @kaz-Anova said above, you may want to remove softmaxclassifier for now.

arisbw commented 6 years ago

OK, I'll make sure to remove softmaxclassifier for now. Thank you @goldentom42 @kaz-Anova

kaz-Anova commented 6 years ago

@arisbw .

You dont need to remove it . You just change the seed of Softmax to 10 it works fine:

softmaxnnclassifier usescale:True seed:10 Type:SGD maxim_Iteration:35 C:0.0005 learn_rate:0.001 smooth:0.0001 h1:50 h2:40 connection_nonlinearity:Relu init_values:0.05 verbose:false

Honestly I have not a single clue why... These are the results of all models:

Average of all folds model 0 : 0.7867379723430724
 Average of all folds model 1 : 0.7929649557885149
 Average of all folds model 2 : 0.7866359111370649
 Average of all folds model 3 : 0.7764824969782087
 Average of all folds model 4 : 0.7805320681382869
 Average of all folds model 5 : 0.7754102012289547
 Average of all folds model 6 : 0.7642405924462954
 Average of all folds model 7 : 0.7817682598159342
arisbw commented 6 years ago

...... now it goes much weirder that I can imagine. Again, thanks @kaz-Anova!