rapidsai / cuml

cuML - RAPIDS Machine Learning Library
https://docs.rapids.ai/api/cuml/stable/
Apache License 2.0
4.16k stars 525 forks source link

[QST] How to save/export model as .ONNX ? #5997

Open 4budab1 opened 1 month ago

4budab1 commented 1 month ago

...
from cuml.linear_model import LogisticRegression
from cuml.metrics import accuracy_score
...
print("Initializing Logistic Regression model...")
model = LogisticRegression()
print("Training the model...")
model.fit(train_X, train_y)
print("Model training complete.")

Model is ready,

Is there are solution how to save as .onnx (to use outside python)?

I found way to save model only as .pkl file, but .pkl unsupported by software

print("Saving the model...")
joblib.dump(model, 'cuml_logistic_regression_model.pkl')
viclafargue commented 1 month ago

cuML does not yet support this feature.

4budab1 commented 1 month ago

Im total noobie in ML's, any workaround solution?

As im understand this chatgpt solution just create sklearn model on CPU with same parameters, inputs and outputs, and do it instantly because model is quite small, and for more bigger models there is no reason to use CuML(because you do it on GPU, then again on CPU), because you can't export it?

import onnx
import skl2onnx
from skl2onnx.common.data_types import FloatTensorType
from sklearn.linear_model import LogisticRegression

# Assume model is trained and saved using scikit-learn
# Re-initialize and train the Logistic Regression model using scikit-learn
print("Re-initializing and training Logistic Regression model with scikit-learn...")
sklearn_model = LogisticRegression()
sklearn_model.fit(train_X.get(), train_y.get())  # Convert CuPy arrays to NumPy for sklearn

# Convert the scikit-learn model to ONNX format
print("Converting scikit-learn model to ONNX format...")
initial_type = [('float_input', FloatTensorType([None, train_X.shape[1]]))]
onnx_model = skl2onnx.convert_sklearn(sklearn_model, initial_types=initial_type)

# Save the ONNX model to file
onnx_model_path = 'logistic_regression_model.onnx'
print(f"Saving the ONNX model to {onnx_model_path}...")
with open(onnx_model_path, 'wb') as f:
    f.write(onnx_model.SerializeToString())
print("ONNX model saved successfully.")

Maybe some workaround like convert CuML to Tensorflow or Pytorch model, then convert it to .ONNX?

viclafargue commented 1 month ago

It is possible to convert some of the cuML estimators into their Scikit-Learn equivalent.

from cuml.linear_model import LogisticRegression
from cuml.common.device_selection import using_device_type

# train on GPU using cuML
cuml_model = LogisticRegression()
cuml_model.fit(X, y)

# run a mini-inference on host to trigger the generation of the CPU model
with using_device_type('cpu'):
    cuml_model.predict(X)

sklearn_model = cuml_model._cpu_model

It is then indeed possible to turn this Scikit-Learn model into an ONNX model with the help of skl2onnx. But, the ONNX model cannot be ran with cuML. You may then possibly be able to run inferences with tensorRT if the operators used in the ONNX graph are implemented by it (which is likely when sticking to linear algebra).