To run it in fully Managed C#: Explanation on how to use the code. This information should be added to the repository

78Spinoza commented 2 months ago

Hello there First many thanks for the source code.
You can train your model in phyton or R , whatever and save it in native format. To run it in fully Managed C#: Explanation on how to use the code should be added to the repository Below is what I did to make it work and some small issues that needed to be resolved to get best performance.

Using LightGBM with C# in Fully Managed Code

1. Install the LightGBMNet.Train Package

Use NuGet Package Manager or .NET CLI:
```
dotnet add package LightGBMNet.Train
```

2. Load a Native LightGBM Model

Set culture to InvariantCulture to avoid parsing errors.

Use Booster.FromFile to load the model:

Thread.CurrentThread.CurrentCulture = CultureInfo.InvariantCulture;
var booster = Booster.FromFile(loadDataDialog.FileName);

3. Validate the Model

Create a feature array and compare predictions with expected results:

float[] inputData = new float[] { /* feature values */ };
double shouldbe = -0.3346386538390942;
double[] predictions = booster.PredictForMat(Booster.PredictType.Normal, inputData, 0, -1);
double prediction = predictions[0];

4. Transform the Model to Fully Managed Code

Use booster.GetModel() to get the managed model:

(Ensemble ensemble, Parameters par) = booster.GetModel();

5. Validate the Managed Model

Make predictions using the managed model and check for small rounding errors:

VBuffer<float> denseBuffer = new VBuffer<float>(inputData.Length, inputData);
ensemble.MaxThreads = Environment.ProcessorCount;
double prediction2 = ensemble.GetOutput(ref denseBuffer, 0, -1);
//   prediction2 = -0.33463865383909XX; where the  last digits XX where bad but this is ok for us.

6. Save and Load the Managed Model

Save the managed model using BinaryWriter and GZipStream:

using (var fileStream = new FileStream(filePath, FileMode.Create, FileAccess.Write))
using (var gzipStream = new GZipStream(fileStream, CompressionMode.Compress))
using (var writer = new BinaryWriter(gzipStream))
{
  ensemble.Save(writer);
}

Load the managed model:

using (var fileStream = new FileStream(filePath, FileMode.Open, FileAccess.Read))
using (var gzipStream = new GZipStream(fileStream, CompressionMode.Decompress))
using (var reader = new BinaryReader(gzipStream))
{
  loadedModel = new Ensemble(reader, false);
}

7. Use the Managed Model in Production

Use the managed model to avoid native DLL dependencies and run it on other OS like Linux.

By following these steps, you can effectively use LightGBM with C# in a fully managed environment, ensuring compatibility and performance across different platforms.

mjmckp commented 2 months ago

Thanks @78Spinoza, however these instructions are not how the library is intended to be used. See the unit tests to see how the native GBDTs can be trained, converted to managed objects, and evaluated. I do not recommend using the output of Booster.GetModel, as the ensemble object does not have all the necessary transformations on the output required for binary/multiclass model evaluation.

78Spinoza commented 2 months ago

I checked the unittest. I would like to train the model in Phyton since there are man many visualization and hyperparameter tuning that exist. How can I use a trained model and not Booster.GetModel ? I know that it only works for regression but internally all classification and other are regression for LightGBM but I understand what you say.. Can you please provide some example more simple as I did above?

mjmckp commented 2 months ago

@78Spinoza I'll have a look at how best to do this and let you know.

mjmckp commented 1 month ago

@78Spinoza The correct way to load externally trained models from file is shown inthe unit test LoadExternalModels. This works for all model types (regression, binary, multiclass, and ranking) and will ensure the managed model produces exactly the same output as the native model.

Also, to save the managed model to file, use PredictorPersist.Save, and to load the model from file use PredictorPersist.Load. Examples of this are in TrainerTest.cs.

78Spinoza commented 1 month ago

But I want to load for inference not training ? var regression = RegressionTrainer.PredictorsFromFile(Path.Combine(path, "models", "regression_model.txt") ???

mjmckp commented 1 month ago

@78Spinoza In the example above, "regression_model.txt" is a saved trained model (not training data), as generated by the native lightgbm library, and may be used for inference. The training may have been done elsewhere, e.g., in a Python script.

78Spinoza commented 1 month ago

Yes, many thanks but the class RegressionTrainer.PredictorsFromFile is in the LightGBMNet.Train namespace and we need to load the dll wrapper.. no? We need unmanaged code for inference..

mjmckp commented 1 month ago

Ok, I think I understand what you are trying to do: load an externally trained model directly into managed code without requiring any references to the native library. I've refactored the code to allow this (see latest commits) and uploaded a new NuGet package (1.0.22) to allow this. See the unit test LoadExternalModelsManagedOnly for an example.

rca22 / LightGBM.Net