kingfengji / gcForest

This is the official implementation for the paper 'Deep forest: Towards an alternative to deep neural networks'
http://lamda.nju.edu.cn/code_gcForest.ashx
1.31k stars 425 forks source link

How to maintain a consistent result? #57

Closed boom85423 closed 5 years ago

boom85423 commented 5 years ago

Hello @kingfengji:

Our all think the gcforest performance is awesome. I already use random_state at config and train test split, but the result is still not consistent. Actually, I don't know what is the random_state in config stand for random_state in Classifier or K-fold validation in convergence? Excuse me, what I have missed?

Best regards

raven4752 commented 5 years ago

I meet the same problem and fix it. The inconsistency is caused by lib/gcforest/cascade/cascade_classifier.py line 110: random_state = (self.random_state + hash("[estimator] {}".format(est_name))) % 1000000007 the hash function gives inconsistent results after Python 3.3. The explanation can be found here: https://stackoverflow.com/questions/27522626/hash-function-in-python-3-3-returns-different-results-between-sessions I fix that by setting environment variable PYTHONHASHSEED=0 before running my script.

boom85423 commented 5 years ago

Dear @raven4752:

My problem has solved. Thanks you very much.

Have a nice day~