rca22 / LightGBM.Net

.Net wrapper around the LightGBM library
MIT License
15 stars 4 forks source link

Continue training #4

Closed alpha-wavelet closed 2 years ago

alpha-wavelet commented 2 years ago

After loading previously saved model a call to ContinueTraining results in exception: No existing booster to train.

Is this a bug or a feature?

Thanks

mjmckp commented 2 years ago

Would you please provide an example of what you would like to do? An easy way to do this might be to add a failing test to the test project. Thanks

alpha-wavelet commented 2 years ago

Thank you for your reply.

My requirement is to update an existing model with new data (large and numerous). But after further consideration, there appears to be no support in this implementation for loading a model and using it to continue training.

Please let me know if there is a way to do it.

Thanks

mjmckp commented 2 years ago

I've added a new constructor to the trainer types to allow you to specify an existing model and a new dataset for training here: https://github.com/rca22/LightGBM.Net/commit/22a538110a4f86976c00d7ada99b6b0308fd454d

Please note that LightGBM does not allow the validation data set to be reset on an existing booster object, so the dataset you pass into the new constructor is required to have null for the validation dataset object.

Would you please try this out and let me know how you get on?

alpha-wavelet commented 2 years ago

Thank you for the update.

On the test I am getting:

System.AccessViolationException: 'Attempted to read or write protected memory. This is often an indication that other memory is corrupt.'

on the call to PInvoke.BoosterResetTrainingData() below:

public void ResetTrainingData(Dataset trainset) { Check.NonNull(trainset, nameof(trainset)); PInvokeException.Check(PInvoke.BoosterResetTrainingData(Handle, trainset.Handle), nameof(PInvoke.BoosterResetTrainingData)); }

Both the Handle and the trainset.Handle look good.

Please let me know if I can assist in any way.

Thank you

mjmckp commented 2 years ago

Ok, I've found a way that works now, see changes here: https://github.com/rca22/LightGBM.Net/commit/063309a1c2f3ae69e015eef041bc80015226e286

Instead of trying to load the model and reset the training data, I've created a new empty model with the new training AND validation data, and then merged in all the trees from the existing model.

alpha-wavelet commented 2 years ago

It worked great!

Thank you so much!