snowflakedb / snowflake-ml-python

Apache License 2.0
38 stars 8 forks source link

OrdinalEncoder issue in GridSearchCV #43

Closed karlenander closed 10 months ago

karlenander commented 11 months ago

Hi! executing code similar to below results in an error

 pipeline = Pipeline(
        steps=[
            (
                "preprocessing",
                ColumnTransformer(
                    transformers=[
                        (
                            "ORD",
                            OrdinalEncoder(handle_unknown="use_encoded_value", unknown_value=np.nan),
                            categorical_features,
                        ),
                        ("MMS", MinMaxScaler(clip=True), numerical_features),
                    ]
                ),
            ),
            ("REG", XGBRegressor(label_cols=[target_name])),
        ]
    )

    parameters = {
        "REG__max_depth": [5, 10,],
        "REG__n_estimators": [50],
        "REG__learning_rate": [0.1],
    }
    grid = GridSearchCV(
        estimator=pipeline,
        param_grid=parameters,
        input_cols=categorical_features + numerical_features,
        label_cols=[target_name],
        ...

    )
    grid.fit(train_df)

error message

line 141, in get_filtered_valid_sklearn_args
        if isinstance(val, float) and (np.isnan(val) and np.isnan(default_sklearn_args[key])):
    TypeError: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

When using unknown_value=100 or similar, everything works as expected (only have issue when setting unknown_value=np.nan)

sfc-gh-xjiang commented 11 months ago

Hi @karlenander, this bug is currently fixed and released in version of 1.0.8. Thanks for reporting the issue, and please try with our latest package to see if the issue still exists.

karlenander commented 10 months ago

works! thanks for fixing.