juaml / julearn

Forschungszentrum Jülich Machine Learning Library
https://juaml.github.io/julearn
GNU Affero General Public License v3.0
30 stars 19 forks source link

[BUG] problem_type="regression" needs to be defined in pipeline and in run_cross_validation #192

Closed kaurao closed 1 year ago

kaurao commented 1 year ago

Describe the bug problem_type="regression" needs to be defined in a pipeline and also in run_cross_validation, otherwise classification is inferred.

To Reproduce

creator = PipelineCreator()
creator.add('pca', apply_to='pca1', n_components=1)
creator.add('zscore', apply_to='pca2')
#creator.add('pca', apply_to='pca2', n_components=1)
creator.add('ridge', apply_to=['continuous', 'categorical'], problem_type='regression')

###############################################################################
# Split the dataset into train and test
train_diabetes, test_diabetes = train_test_split(data_diabetes, test_size=0.3)

###############################################################################
# Train a ridge regression model on train dataset and use mean absolute error
# for scoring
scores, model = run_cross_validation(
    X=X,
    y=y,
    X_types=X_types,
    data=train_diabetes,
    model=creator,
    problem_type="regression",
    scoring="neg_mean_absolute_error",
    return_estimator='final'
)

Expected behavior The definitio n should be taken from the pipeline step and definining it again in run_cross_validation should raise an error.

System (please complete the following information):

fraimondo commented 1 year ago

fixed in #183 as now you receive an error if you don't specify the problem type in the PipelineCreator