Open afhuertass opened 7 years ago
This seems like a bug and yeah , the process is not reliable past this error ... I have 3 qs:
1) does StackNet run fine if you remove this lightgbm model without error?
2) can you open /models/nlr9tp06r037rshv48jde0rupi.pred
and see what the first few lines look like and paste them here?
3) can you share the parameters of that lightgbm model?
I think this will give good insight about the problem.
Hi.
The file does look like this:
0.12867875090262434 0.088004599332437289 0.098988997494944345 0.12555075946500394 0.10027780470535767 0.096548999712578257 0.19475095007255408 0.098988997494944345 0.068210140819555704
0.11440231101507951 0.10179915182052937 0.088006527876986013 0.14188226202070342 0.072081952626786808 0.068282060913537862 0.26489673232766497 0.088006527876986013 0.06064247352172606
0.11689381372247522 0.096543416835966997 0.089923171872447832 0.19738792559435381 0.0805254495498091 0.070464705680191572 0.19637517349385172 0.089923171872447832 0.061963171378455875
0.11689381372247522 0.096543416835966997 0.089923171872447832 0.19738792559435381 0.0805254495498091 0.070464705680191572 0.19637517349385172 0.089923171872447832 0.061963171378455875
I don't really understand what are those.
the parameters for lightgbm are
LightgbmClassifier boosting:gbdt num_leaves:30 num_iterations:255 scale_pos_weight:1.0 skip_drop:0.5 uniform_drop:false xgboost_dart_mode:false two_round:false top_rate:0.1 sigmoid:1.0 is_unbalance:false max_bin:255 poission_max_delta_step:0.7 min_sum_hessian_in_leaf:0.0001 other_rate:0.1 min_data_in_bin:5 max_drop:50 drop_rate:0.1 learning_rate:0.1 threads:1 max_depth:5 feature_fraction:0.5 min_data_in_leaf:10 min_gain_to_split:20 bagging_fraction:0.9 lambda_l1:0.1 lambda_l2:0.1 bagging_freq:1 bin_construct_sample_cnt:100000 seed:1 threads:1 bags:1 verbose:false
And yes, it work correctly when that model is removed from the parameters file.
hm. That is a bit confusing as the output of the file seems correct. You have 9 probabilities per row (1 for each class). The error you got before was essentially saying that this file is not having 9 columns, but it definitely has and seems correct...I will look more into it.
Yeah is strange, and as I said, using other classifiers doesn't give any error...
Hi
I'm trying to train a StackNet using sparse data. The problem is a classification problem with 9 possible categories. I had my training file in sparse format like this :
0 2:1 6:1 13:1 17:1 22:1 23:1 30:1 42:1 47:1 59:1 67:1 71:1 72:1 84:1 86:1 1 2:1 17:1 22:1 42:1 43:1 45:1 47:1 57:1 59:1 67:1 70:1 72:1 86:1 88:1 99:1 etc etc
And in the parameters file I have a list of classifiers, when i start the training, It gives an error after some time.
Fitting model : 9 ( this model is a LightgbmClassifier )
and the process doesn't stop, it keep training and even finishes the training. But I'm concerned about what this means and how is affecting training.
In other experiments, the process freezes, when trying to fit the models in the next fold.
Thanks, and any help is really appreciated :)