java.lang.IllegalStateException: The produced score in temporary -file- is not of correct size

afhuertass commented 7 years ago

Hi

I'm trying to train a StackNet using sparse data. The problem is a classification problem with 9 possible categories. I had my training file in sparse format like this :

0 2:1 6:1 13:1 17:1 22:1 23:1 30:1 42:1 47:1 59:1 67:1 71:1 72:1 84:1 86:1 1 2:1 17:1 22:1 42:1 43:1 45:1 47:1 57:1 59:1 67:1 70:1 72:1 86:1 88:1 99:1 etc etc

And in the parameters file I have a list of classifiers, when i start the training, It gives an error after some time.

Fitting model : 9 ( this model is a LightgbmClassifier )

Exception in thread "Thread-11935" java.lang.IllegalStateException:  The produced score in temporary file /home/andresh/data-science/StackNet/models/nlr9tp06r037rshv48jde0rupi.pred  is not of correct size
    at ml.lightgbm.LightgbmClassifier.predict_proba(LightgbmClassifier.java:806)
    at ml.Bagging.scoringhelpercatbagv2.score(scoringhelpercatbagv2.java:158)
    at ml.Bagging.scoringhelpercatbagv2.run(scoringhelpercatbagv2.java:188)
    at java.lang.Thread.run(Thread.java:745)

and the process doesn't stop, it keep training and even finishes the training. But I'm concerned about what this means and how is affecting training.

In other experiments, the process freezes, when trying to fit the models in the next fold.

Thanks, and any help is really appreciated :)

kaz-Anova commented 7 years ago

This seems like a bug and yeah , the process is not reliable past this error ... I have 3 qs:

1) does StackNet run fine if you remove this lightgbm model without error? 2) can you open /models/nlr9tp06r037rshv48jde0rupi.pred and see what the first few lines look like and paste them here? 3) can you share the parameters of that lightgbm model?

I think this will give good insight about the problem.

afhuertass commented 7 years ago

Hi.

The file does look like this:

0.12867875090262434 0.088004599332437289    0.098988997494944345    0.12555075946500394 0.10027780470535767 0.096548999712578257    0.19475095007255408 0.098988997494944345    0.068210140819555704
0.11440231101507951 0.10179915182052937 0.088006527876986013    0.14188226202070342 0.072081952626786808    0.068282060913537862    0.26489673232766497 0.088006527876986013    0.06064247352172606
0.11689381372247522 0.096543416835966997    0.089923171872447832    0.19738792559435381 0.0805254495498091  0.070464705680191572    0.19637517349385172 0.089923171872447832    0.061963171378455875
0.11689381372247522 0.096543416835966997    0.089923171872447832    0.19738792559435381 0.0805254495498091  0.070464705680191572    0.19637517349385172 0.089923171872447832    0.061963171378455875

I don't really understand what are those.

the parameters for lightgbm are

LightgbmClassifier boosting:gbdt num_leaves:30 num_iterations:255 scale_pos_weight:1.0 skip_drop:0.5 uniform_drop:false xgboost_dart_mode:false two_round:false top_rate:0.1 sigmoid:1.0 is_unbalance:false max_bin:255 poission_max_delta_step:0.7 min_sum_hessian_in_leaf:0.0001 other_rate:0.1 min_data_in_bin:5 max_drop:50 drop_rate:0.1 learning_rate:0.1 threads:1 max_depth:5 feature_fraction:0.5 min_data_in_leaf:10 min_gain_to_split:20 bagging_fraction:0.9 lambda_l1:0.1 lambda_l2:0.1 bagging_freq:1 bin_construct_sample_cnt:100000 seed:1 threads:1 bags:1 verbose:false

And yes, it work correctly when that model is removed from the parameters file.

kaz-Anova commented 7 years ago

hm. That is a bit confusing as the output of the file seems correct. You have 9 probabilities per row (1 for each class). The error you got before was essentially saying that this file is not having 9 columns, but it definitely has and seems correct...I will look more into it.

afhuertass commented 7 years ago

Yeah is strange, and as I said, using other classifiers doesn't give any error...

kaz-Anova / StackNet

java.lang.IllegalStateException: The produced score in temporary -file- is not of correct size #24