Open jaguerrerod opened 4 months ago
I tried a work around: use a reduced train dataset (I know the bins are the same as all features are integer with exactly 9 values) and create hold dataset using this reduced train but internally lightgbm realized the trick...:
[LightGBM] [Fatal] Cannot add validation data, since it has different bin mappers with training data
Error en booster$add_valid(data = reduced_valid_sets[[key]], name = key):
Cannot add validation data, since it has different bin mappers with training data
Please, extract the bin mappers as a function (for extract) and parameter (to setting) in the lgb.Dataset() Work with big data is impossible due bin mappers
Description
I have very big train and test datasets (> 500 Gb) so I need construct lgb.Dataset from csv files. I use 'two_round = TRUE' parameter to save RAM and leave default free_raw_data After hours I get the datasets and save them to disk. When I load in a clean session and construct dataset with lgb.Dataset.construct() the validation in training doesn't work.
Reproducible example
Error en valid_data$set_reference(data): set_reference: cannot set reference after freeing raw data, please set ‘free_raw_data = FALSE’ when you construct lgb.Dataset
Environment info
Additional Comments
free_raw_data = FALSE doesn't make sense here as the problem is the size of datasets. Why not use a get_bins_mapper() function to extract the mapper from a dataset and pass the bins_mapper to the validation dataset as a parameter? I think this is better than the current embedded bins mappers synchronization.
What can I do? I've tried all the options to work with big data, such as loading from disk, using two rounds, setting free_raw_data = TRUE, saving datasets to disk, etc., and it still doesn't work due to the bins mapper issue.
An important thing: d_train and d_test works well in a training with validation just after construct them. The problem is after save both to disk with lgb.Dataset.save() and load again. Seems lgb.Dataset.save() doesn't save the bins mapper correctly.