mljar / mljar-supervised

Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
https://mljar.com
MIT License
3k stars 401 forks source link

Change in API / bug? #175

Closed tmontana closed 4 years ago

tmontana commented 4 years ago

Hi. Not sure if new behavior is due to a change in API or a bug but the following code used to generate 9 models + ensemble. Now it only trains one (1_Default_Xgboost) and stops.

model_types = ["Xgboost"]
automl = AutoML(
    results_path=experiment_name,
    tuning_mode="Normal",
    total_time_limit=600 * 10,
    model_time_limit=600,
    algorithms=model_types,
    train_ensemble=True,
    explain_level=0,
    stack_models=False,
    validation_strategy={
        "validation_type": "kfold",
        "k_folds": 3,
        "shuffle": False,
        "stratify": True,
    },
)

automl.fit(X, y)

Thanks,

pplonski commented 4 years ago

By default, there is used mode=Explain which probably overwrite the tuning_mode, but it shouldn't. It should set a value only if there is no value from the user. So it is a bug after refactoring.

For now, please set the mode=Perform as it should give you about 10 models.

tmontana commented 4 years ago

you mean changing tuning_mode? I tried but still only running 1 model. I was able to get more with this:

automl.set_params(start_random_models=5, hill_climbing_steps=3, top_models_to_improve=3)

pplonski commented 4 years ago

Please try:

model_types = ["Xgboost"]
automl = AutoML(
    results_path="experiment_name",
    tuning_mode="Normal",
    total_time_limit=600 * 10,
    model_time_limit=600,
    algorithms=model_types,
    train_ensemble=True,
    explain_level=0,
    stack_models=False,
    validation_strategy={
        "validation_type": "kfold",
        "k_folds": 3,
        "shuffle": False,
        "stratify": True,
    },
    mode="Perform",
)

automl.fit(X, y)

I've added the parameter mode. This setup will also try to create new features for you (golden features) and will do the feature selection. To disable golden features and feature selection, please add golden_features=False, features_selection=False. But please give them a try. If you have version 0.7.1 installed you should get a nice printout for your golden features, I hope you will be excited about this feature!

tmontana commented 4 years ago

ok will test and revert. Did the API also change for predict?

preds=automl.predict(X_test) used to return a df with probabilities for each class (in bin classification). Now it returns only predictions. automl.predict_proba() returns a numpy array. Is that the expected behavior?

tmontana commented 4 years ago

Looks like I should be using automl.predict_all now?

thanks

pplonski commented 4 years ago

Yes, predict_all is the way to go. The changes were to be scikit-learn compatible.

tmontana commented 4 years ago

Re => regression it will throw exception (but maybe this should be changed).

Maybe throw a warning instead.

thanks

pplonski commented 4 years ago

I think that maybe it will be better to just run prediction even for regression. It will be duplicate with predict() but it will return DataFrame instead of numpy array.

tmontana commented 4 years ago

new features seem great. good results so far. Nice to see the library being so actively updated. cheers

pplonski commented 4 years ago

OK, I look closer to this issue. I removed the tuning_mode parameter. Its usage was ambiguous.

tmontana commented 4 years ago

testing now. please note small typo in docs: features_selection --> feature_selection