h2o.predict issue related to !!!StackedEnsemble!!! h2o version:

Closed turgut090 closed 5 years ago

turgut090 commented 5 years ago
aml<-h2o.automl(y=outcome, x= features,
                training_frame = train,
                #validation_frame = test,
                leaderboard_frame = test,seed=3,#max_runtime_secs = 120,
                #exclude_algos = c("StackedEnsemble"))#,
                max_models = 2)

testing=read_csv('../input/test.csv') %>% select(-ID_code)

light_test2 = h2o.predict(aml@leader,testing %>% as.h2o()) %>% as.data.frame()%>% .$p1


[1] 41865   201

                                             model_id       auc   logloss
1 StackedEnsemble_BestOfFamily_AutoML_20190325_112022 0.8197364 0.5185754
2    StackedEnsemble_AllModels_AutoML_20190325_112022 0.8197364 0.5185754
3                        XRT_1_AutoML_20190325_112022 0.8041693 0.5892604
4                        DRF_1_AutoML_20190325_112022 0.7969176 0.5914487
  mean_per_class_error      rmse       mse
1            0.2743024 0.4163262 0.1733275
2            0.2743024 0.4163262 0.1733275
3            0.2817554 0.4473749 0.2001443
4            0.2936830 0.4486491 0.2012860

[1] "StackedEnsemble_BestOfFamily_AutoML_20190325_112022"

[1] TRUE

[1] "target"

[1] "/kaggle/working/StackedEnsemble_BestOfFamily_AutoML_20190325_112022"

# A tibble: 4 x 6
  model_id                            auc logloss mean_per_class_er…  rmse   mse
  <chr>                             <dbl>   <dbl>              <dbl> <dbl> <dbl>
1 StackedEnsemble_BestOfFamily_Aut… 0.820   0.519              0.274 0.416 0.173
2 StackedEnsemble_AllModels_AutoML… 0.820   0.519              0.274 0.416 0.173
3 XRT_1_AutoML_20190325_112022      0.804   0.589              0.282 0.447 0.200
4 DRF_1_AutoML_20190325_112022      0.797   0.591              0.294 0.449 0.201

java.lang.IllegalArgumentException: Actual column must be integer class labels!

java.lang.IllegalArgumentException: Actual column must be integer class labels!
Error: java.lang.IllegalArgumentException: Actual column must be integer class labels!
Execution halted


sebhrusen commented 5 years ago

Hi @henry090, do you mind trying your scenario again with a more recent nightly build? http://h2o-release.s3.amazonaws.com/h2o/master/4617/index.html It have small reasons to think that it may have been fixed by https://0xdata.atlassian.net/browse/PUBDEV-6208.

One quick question though: is ../input/test.csv file different from the test frame you passed as leaderboard?

turgut090 commented 5 years ago


Actually, I tested that version, too. h2o- does not work with StackedEnsemble. So, downgrading helped a lot.

file different from the test frame

No, absolutely the same structure. This is why I was confused. Kaggle updated h2o and I am obliged to install previous 3.22 version every time.

Here is the data: https://www.kaggle.com/c/santander-customer-transaction-prediction/data

Another user who faced the same issue: https://stackoverflow.com/questions/55194145/error-when-calling-test-file-in-h2o-predict-function/55337889#55337889

sebhrusen commented 5 years ago

@henry090 , thanks for the dataset, I'll try to reproduce this.

what do you mean exactly by h2o- does not work with StackedEnsemble? Do you get a different issue with that version? or is it the same error?

turgut090 commented 5 years ago

I mean there are an h2o 22 (for example and 23 versions. So, 22 is stable, but versions ( and ) do not work with StackedEnsemble. They are h2o 23rd versions.

sebhrusen commented 5 years ago

@henry090 : I identified the issue, fix should be in nightly quickly. please follow progress there: https://0xdata.atlassian.net/browse/PUBDEV-6376 thx!

sebhrusen commented 5 years ago

closing this ticket. refer to https://0xdata.atlassian.net/browse/PUBDEV-6376 for Jira issue. and PR at https://github.com/h2oai/h2o-3/pull/3382