oneapi-src / oneDAL

oneAPI Data Analytics Library (oneDAL)
https://software.intel.com/en-us/oneapi/onedal
Apache License 2.0
617 stars 213 forks source link

Serialization and Deserialization of Classification Models #263

Closed Mightrider closed 1 year ago

Mightrider commented 4 years ago

Hi!

The Bug I am playing around with your classifiers and so far I really like it. I started serialization of the trained model to use is later in the prediction. For the SVM it worked out for me, but doing so with Logistic Regression, Gradient Boost or Decision Forest failed to compile. I think I am not able to create the model for the ModelPtr correctly, because the Model is abstract.

Using (training) ResultPtr instead of ModelPtr works and I can get the Model from there, but I would like to avoid the additional overhead. Is there a way to create the ModelPtr correctly for those cases or is it the intended way to use the ResultPtr?

Error Message

invalid new-expression of abstract class type ‘daal::algorithms::logistic_regression::interface1::Model’
     daal::algorithms::logistic_regression::interface1::ModelPtr vModelPtr(new daal::algorithms::logistic_regression::interface1::Model());

To Reproduce

auto vModel = vTrainingResult->get(classifier::training::model);
vModel->serialize(...);

// this compiled and provided the same results compared to the results without serialization
daal::algorithms::multi_class_classifier::interface1::ModelPtr vModelPtr(new daal::algorithms::multi_class_classifier::interface1::Model());

// These three fail to compile
daal::algorithms::logistic_regression::interface1::ModelPtr vModelPtr(new daal::algorithms::logistic_regression::interface1::Model());
daal::algorithms::decision_forest::classification::interface1::ModelPtr vModelPtr(new daal::algorithms::decision_forest::classification::interface1::Model());
daal::algorithms::gbt::classification::interface1::ModelPtr vModelPtr(new daal::algorithms::gbt::classification::interface1::Model());

vModelPtr->deserialize(...);

Environment:

averbukh commented 4 years ago

Those algorithms have wrong model-hiding implementation, which looks like root cause of the issue. @SmirnovEgorRu, could you please assign the issue to somebody to fix it. I could provide some details, if necessary.

ShvetsKS commented 4 years ago

While better solution is in progress, please try
work-around bellow:

training::Batch<float> train(nClasses);

train.input.set(classifier::training::data, testData);

train.getResult()->allocate<float>(train.getInput(),&train.parameter,0);

/* deserialize the Model */
train.getResult()->get(classifier::training::model)->deserialize(out_dataArch);

We have to create algorithm and call allocation of result. Then we can deserialize the model from OutputDataArchive.

PivovarA commented 4 years ago

Hello @Mightrider
Do you need further help on this issue? Can we close this issue?

Mightrider commented 4 years ago

Well I managed to implement a workaround using serialize/deserialize of the result instead of using the model directly. So fixing this issue is no high priority for me but as far as I can tell the problem still exists. Unless it is not intended to serialize/deserialize models at all...

emmenlau commented 4 years ago

Actually we are still failing to serialize / deserialize certain models. Could this issue be considered for fixing again?

tmostak commented 1 year ago

I wanted to check and see if there was any update on this. We've hit this as well and would love to be able to construct models directly for model types such as decision forest.

napetrov commented 1 year ago

@tmostak @emmenlau Appreciate if you can share more context here. Are you looking for specific models beyond decision forest?

Also a question - would you consider moving to oneDAL interfaces if serialization would be implemented there?

emmenlau commented 1 year ago

Dear @napetrov thanks for the reply!

For me personally it's ok to move to oneDAL eventually, as I understand this is the future of the library?

If possible it would be great if all models eventually support serialization, because we'd like to store them to disk. Is that a realistic goal?

tmostak commented 1 year ago

@napetrov yes ideally all models would support serialization so we can store on disk (we're adding ML training/inference support to our database). However if it helps with prioritization we're currently using the linear regression, random forest, GBT and decision tree models for regression and kmeans and dbscan for clustering.

And yes we'd consider moving to oneDAL proper as soon as we can scope the time to do so, although it would be nice if the daal interfaces still supported serialization of all model types.

tmostak commented 1 year ago

@napetrov Wanted to see if you had any update on this?

napetrov commented 1 year ago

@tmostak , @emmenlau - yes oneDAL have working serialization. And we would look on scope to get DAAL serialization fixed.