mllite / ml2cpp

Machine Learning Models Deployment using C++ Code Generation
BSD 3-Clause "New" or "Revised" License
8 stars 1 forks source link

Add a jupyter notebook using Boost.Python to deploy the model with pure python code #35

Closed antoinecarme closed 2 years ago

antoinecarme commented 2 years ago

Boost.Python can be used to transform the C++ model code into a python module that can be imported.

https://www.boost.org/doc/libs/1_63_0/libs/python/doc/html/index.html

Follow the six steps described in https://github.com/antoinecarme/ml2cpp/issues/1

This method can be used for deploying a model without having numpy/scikit-learn/R/keras installed (see #25 with MicroPython).

antoinecarme commented 2 years ago

We use a random forest (512 trees ;) on the iris dataset.

The Model C++ code is stored under /tmp/sklearn2sql_cpp_140409960179088_model_specific.i

antoinecarme commented 2 years ago

With Boost.Python this code can be encapsulated/pythonified and stored as /tmp/sklearn2sql_cpp_140409960179088.cpp

#include "Generic.i"
#include "/tmp/sklearn2sql_cpp_140409960179088_model_specific.i"

#include <boost/python.hpp>
using namespace boost::python;

BOOST_PYTHON_MODULE(sklearn2sql_cpp_140409960179088) {
    def("score_csv_file", score_csv_file); 
}
antoinecarme commented 2 years ago

We compile this into a shared library /tmp/sklearn2sql_cpp_140409960179088.so that can be loaded as a python module

g++ -I/usr/include/python3.10 -Wno-unused-function -fPIC -std=c++17 -g -o /tmp/sklearn2sql_cpp_140409960179088.o -c /tmp/sklearn2sql_cpp_140409960179088.cpp

g++ /tmp/sklearn2sql_cpp_140409960179088.o -shared -Wl,--export-dynamic -lboost_python310 -L/usr/lib/python3.10/config -lpython3.10 -o /tmp/sklearn2sql_cpp_140409960179088.so
antoinecarme commented 2 years ago

Sample python deployment code :

The following python code can be used to score a given CSV file

        import sys
        sys.path = sys.path + ['/tmp']
        import sklearn2sql_cpp_140409960179088 as mymodel
        result = mymodel.score_csv_file("/tmp/iris.csv") # returns a python string
        print(result)
antoinecarme commented 2 years ago

Sample output :


idx,Score_0,Score_1,Score_2,Proba_0,Proba_1,Proba_2,LogProba_0,LogProba_1,LogProba_2,Decision,DecisionProba
0,,,,1.00000000000000,0.00000000000000,0.00000000000000,0.00000000000000,-32.23619130191664,-32.23619130191664,0,1.00000000000000
1,,,,0.99804687500000,0.00195312500000,0.00000000000000,-0.00195503483580,-6.23832462503951,-32.23619130191664,0,0.99804687500000
2,,,,1.00000000000000,0.00000000000000,0.00000000000000,0.00000000000000,-32.23619130191664,-32.23619130191664,0,1.00000000000000
3,,,,1.00000000000000,0.00000000000000,0.00000000000000,0.00000000000000,-32.23619130191664,-32.23619130191664,0,1.00000000000000
4,,,,1.00000000000000,0.00000000000000,0.00000000000000,0.00000000000000,-32.23619130191664,-32.23619130191664,0,1.00000000000000
antoinecarme commented 2 years ago

jupyter notebook available.

https://github.com/antoinecarme/ml2cpp/blob/master/doc/boost_python/ml2cpp_random_forest_classifier_iris_boost_python.ipynb