Error with SklearnknnClassifier 'metric has to be between 'rbf', 'poly', 'sigmoid' or 'linear''

AurelianTactics commented 7 years ago

I believe there is a wrong error check in the file SklearnknnClassifier.java

        if ( !metric.equals("rbf")  && !metric.equals("poly")&& !metric.equals("sigmoid") && !metric.equals("linear") ){
            throw new IllegalStateException(" metric has to be between 'rbf', 'poly', 'sigmoid' or 'linear'" ); 
        }

        if ( !metric.equals("uniform")  && !metric.equals("distance") ){
            throw new IllegalStateException(" metric has to be between 'uniform' or 'distance'" );  
}

I think the first metric.equals check block is incorrect and you only want the second one. I couldn't get the SklearnknnClassifier to work.

Also, in the docs the parameters section for KerasnnClassifier section has the wrong header (it has the header 'SklearnsvmClassifier'). There are also some typos like dropout being labeled 'droupout' and 'Toral' instead of 'Total.'

Thanks for creating and sharing the StackNet tool.

kaz-Anova commented 7 years ago

All correct .

I will fix in next release.

AurelianTactics commented 7 years ago

I'm new to StackNet (I read the Kaggle AMA and blog post, went through a couple of examples, and have used StackNet on a personal project) and have a couple of usage questions. If there's a preferred place to ask these questions, I can ask them there of if these question have been answered already please direct me to those answers.

1.) How can I combine predictions with models made outside of StackNet into the StackNet ensemble? In the zillow example, the combine_subs.py does it but only after StackNet has produced its results. This thread from the TwoSigma does it by adding the xgboost predictions as features. Is there a way to take a simple .csv file that is a model's predictions for each row of data and add it to a StackNet level? Like could I input predictions from a neural net into level 1 of StackNet?

2.) Can I have StackNet use different data sets or different selections of features for different models? Ie the logistic regression model only use feature columns 10 to 20 or use a different dataset while the xgboost model uses a separate data set?

kaz-Anova commented 7 years ago

1)

Lets say we have 4 models level 1 models and a level 2 models.

The way I would do it is I would add the output_name=your_whatever_name command when training StackNnet . This will print the the predictions of those 4 models (after cross validation) as well as the test predictions. So apart from the final predictions, you will also have these 2 files which are in the same order as you training data and test data . You could train your own models (lets say in python ), save your K-FOLD cross validation and test results into csv files. Then you could merge the .csv files of StackNet's level 1 output (4 models) and your python models and create a new .csv file. You could add the label in the beginning (for the training data) and run StackNet all over again. It will be a level 1 StackNet again , but in reality it is level2.

2)

One option is to use python Generic option. Inside there you can do feature selections and basically you can do whatever you want, but you are limited to only algorithms that can be run from python. otherwise You have to create different datasets and create different StackNets - I don't see any other way to do this.

AurelianTactics commented 7 years ago

Thanks, I was able to do what you said in question 1. I'll do some more reading on question 2 and try that out. Thanks again for creating StackNet.

kaz-Anova commented 7 years ago

This should now be fixed. Feel free to re-open if there are still issues with it.

kaz-Anova / StackNet

Error with SklearnknnClassifier 'metric has to be between 'rbf', 'poly', 'sigmoid' or 'linear'' #27