pylablanche / gcForest

Python implementation of deep forest method : gcForest
MIT License
417 stars 193 forks source link

no saveModel function? #4

Closed dingtiandu closed 7 years ago

dingtiandu commented 7 years ago

Hi, Your implementation is good. There is a function gcf.fit(X_train, y_train) for training and gcf.predict(X_test) for testing. Is there a function like saveModel() for saving gcf.fit(X_train, y_train)'s result,and a function like lodelModel() for loading gcf.fit(X_train, y_train)'s result ?

pylablanche commented 7 years ago

Hi,

There is not yet such functions, obviously meaning that as soon as you close the python session you lose everything. That is definitely something I will consider including in the next update!

dingtiandu commented 7 years ago

Thanks for replying.

I have an other question.when my own dataset has 50,000 rows,your code run well,however ,when my own dataset has 500,000 rows,00B error happens. I have tried increased the number of trees and re-running the training,but still get the error.I have tried gcf = gcForest(shape_1X=16, window=8, tolerance=0.0, min_samples_mgs=30, min_samples_cascade=30), gcForest(shape_1X=16, window=8, tolerance=0.0, min_samples_mgs=80, min_samples_cascade=80), gcForest(shape_1X=16, window=8, tolerance=0.0, min_samples_mgs=120, min_samples_cascade=120), even gcForest(shape_1X=16, window=8, tolerance=0.0, min_samples_mgs=200, min_samples_cascade=200)!but still get 00B error...

How to solve it?

pylablanche commented 7 years ago

The OOB error is almost expected. According to the line of codes you copied you have changed the stopping criterion when building trees and not the number of trees. So just to make things clear :

min_samples_mgs and min_sample_cascade control the construction of each tree and more precisely the stopping criteria. If at a node you have reached the minimum of samples no further spli will be done.

If you want to change the number of trees per forest you need to use : n_cascadeRFtree and n_mgsRFtree

Anyway, you can increase the number of trees and see if the error disappear or it is also possible to remove the OOB score function and use a 3-fold cross validation which will not return any error but will take (much) longer to compute.

Is the answer satisfying?

dingtiandu commented 7 years ago

Thanks.I will try again.

dingtiandu commented 7 years ago

Thanks again.Your code runs well after increasing n_cascadeRFtree and n_mgsRFtree .

dingtiandu commented 7 years ago

Code has run 2 more days for training 500,000 rows datas(still training). So a saveModel function is very important.If you finish updating your saveModel function,please contact me.Thanks.

pylablanche commented 7 years ago

@dingtiandu I'm a little bit busy this week but you can look at this link on how to save models outside python : http://machinelearningmastery.com/save-load-machine-learning-models-python-scikit-learn/ It looks pretty straightforward (...) and will definitely include it in the next update. Let me know if make it work or if you have any problems.

dingtiandu commented 7 years ago

Hi,I used joblib function(http://machinelearningmastery.com/save-load-machine-learning-models-python-scikit-learn/),and the code worked well.Thanks.

pylablanche commented 7 years ago

@dingtiandu I have updated the README and added a note in the notebook on how to use sklearn.joblib properly to save and load models. Thanks for your contribution!