predict-idlab / GENESIM

[DEPRECATED] An innovative technique that constructs an ensemble of decision trees and converts this ensemble into a single, interpretable decision tree with an enhanced predictive performance
Other
79 stars 14 forks source link

Unable to run the example provided #7

Closed naveenkaushik2504 closed 7 years ago

naveenkaushik2504 commented 7 years ago

After following the instructions to setup the GENESIM, I was trying to run it for just one dataset ( wine dataset). But I was getting exceptions when the CART algorithm tries to build the trees. Following is the snippet that i get:

CART
Traceback (most recent call last):
  File "example.py", line 65, in <module>
    clf = algorithms[algorithm].construct_classifier(train, feature_cols, label_col)
  File "D:\GENESIM\GENESIM\constructors\treeconstructor.py", line 237, in construct_classifier
    shuffle=True, random_state=None))
  File "D:\GENESIM\GENESIM\constructors\treeconstructor.py", line 336, in get_best_cart_classifier
    tree = cart.construct_classifier(train_tune, X_train_tune.columns, label_col, param_opt=False)
  File "D:\GENESIM\GENESIM\constructors\treeconstructor.py", line 249, in construct_classifier
    self.dt.fit(self.X, self.y)
  File "C:\Users\Naveen Kaushik\Anaconda2\lib\site-packages\sklearn\tree\tree.py", line 739, in fit
    X_idx_sorted=X_idx_sorted)
  File "C:\Users\Naveen Kaushik\Anaconda2\lib\site-packages\sklearn\tree\tree.py", line 199, in fit
    % self.min_samples_split)
ValueError: min_samples_split must be at least 2 or in (0, 1], got 1

Before CART it built trees with xgboost (Although with exception related to Bayesian Optimization).

What could be the possible reason for this?

GillesVandewiele commented 7 years ago

Hello, could you try to adapt https://github.com/IBCNServices/GENESIM/blob/master/constructors/treeconstructor.py#L318 to min_samples_splits = np.arange(2,20,1) to see if the problems is fixed? It appeart that it is trying Grid Search and tries value 1 which now generates an exception in a newer version of sklearn.

Alternatively, you can param_opt=False in the construct_classifier call.

About the exceptions from xgboost, where those warnings or real exceptions?

GillesVandewiele commented 7 years ago

Moreover, if specific algorithms fail, you can just remove them from the algorithms dictionary on top of the example script (https://github.com/IBCNServices/GENESIM/blob/master/example.py#L29) (this is a very easy solution ;) )

naveenkaushik2504 commented 7 years ago

could you try to adapt https://github.com/IBCNServices/GENESIM/blob/master/constructors/treeconstructor.py#L318 to min_samples_splits = np.arange(2,20,1) to see if the problems is fixed?

I changed what you suggested along with https://github.com/IBCNServices/GENESIM/blob/master/constructors/treeconstructor.py#L247. Here the argument min_samples_split was getting set to self.min_samples_leaf. I changed this to

self.dt = DecisionTreeClassifier(criterion=self.criterion, min_samples_leaf=self.min_samples_leaf,
                                         min_samples_split=self.min_samples_split, max_depth=self.max_depth)

It worked after this.

About the exceptions from xgboost, where those warnings or real exceptions?

A set of warnings like the following. UserWarning: fmin_l_bfgs_b terminated abnormally with the state: {'warnflag': 2, 'task': 'ABNORMAL_TERMINATION_IN_LNSRCH', 'grad': array([ 1.16287611e-05]), 'nit': 5, 'funcalls': 50}

Moreover, if specific algorithms fail, you can just remove them from the algorithms dictionary on top of the example script (https://github.com/IBCNServices/GENESIM/blob/master/example.py#L29) (this is a very easy solution ;) )

I tried that already :). I was actually running it on my windows machine and I ran into trouble in the QUEST trees where it passes arguments using the subprocess. Will try running it on ubuntu machine.

GillesVandewiele commented 7 years ago

I get the same warnings (it's from the https://github.com/fmfn/BayesianOptimization library)

I have never ran it before on a Windows machine, I hope you can get it to work! Can I ask for what goal you are trying to use my library?

naveenkaushik2504 commented 7 years ago

I have created an ensemble of decision trees out of a GBM algorithm. Now I want to combine them into one so that I could somehow visualize the output of my model and also present it. But now I am wondering how would i integrate my trees with your algorithm. For that i guess I'll have to convert them to a format of your decisiontree class or I could implement the mutation and cross over part on my own and get the final tree. Any suggestions are most welcome on this :)

And regarding running it on a windows, it definitely is a pain and I won't recommend it but I somehow got it setup. Still, the subprocess issue seems to be a road block on this.

GillesVandewiele commented 7 years ago

Hello, implementing the interface for GBM is a possibility (if you do it, definitely create a pull request for it). On the other hand, it can be done with minimal adaptation to the genetic_algorithm function (https://github.com/IBCNServices/GENESIM/blob/master/constructors/genesim.py#L378). Here, in the beginning of the function, an ensemble in constructed (tree_list), this can be removed from the function and added as a new parameter. Then you would just have to extract the decision trees, convert them to my decisiontree object and pass them along as a parameter.

There's an issue open for removing the ensemble construction from the genetic_algorithm function, so feel free to create a pull request again if you choose this way :)

GillesVandewiele commented 7 years ago

Any updates @naveenkaushik2504? Else I'm closing this issue

naveenkaushik2504 commented 7 years ago

No updates actually. I moved on to some other project. Will explore more on this later. Closing the issue.