google / yggdrasil-decision-forests

A library to train, evaluate, interpret, and productionize decision forest models such as Random Forest and Gradient Boosted Decision Trees.
https://ydf.readthedocs.io/
Apache License 2.0
499 stars 53 forks source link

Update documentation using model.h of bazel #149

Open Bkeinn opened 5 days ago

Bkeinn commented 5 days ago

I would like to run a trained random forest using c++ and I was following the documentation: https://ydf.readthedocs.io/en/stable/tutorial/cpp/#generate-the-c-code

The “simple” example just mentions that one would have to add some dependencies to bazel, but no mansion on where these dependencies come from. I was not able to get this example up and running because I couldn’t get the dependencies to line up so I and probably others could benefit from an updated documentation, with maybe a fully contained bazel file.

rstz commented 4 days ago

I think you can patch the YDF standalone example at https://github.com/google/yggdrasil-decision-forests/tree/main/examples/standalone and replace the Bazel file with

package(
    default_visibility = ["//visibility:public"],
    licenses = ["notice"],
)

cc_library(
    name = "ydf_tutorial_model",
    hdrs = ["ydf_tutorial_model.h"],
    deps = [
        "@com_google_absl//absl/strings",
        "@com_google_absl//absl/status:statusor",
        "@ydf//yggdrasil_decision_forests/api:serving",
    ],
)

In ydf_tutorial_model.h, you'll have to replace #include "external/ydf_cc/yggdrasil_decision_forests/api/serving.h" with #include "yggdrasil_decision_forests/api/serving.h"

This will compile the library to run the model (I tested it quickly with Bazel 5.3.0 and the compile options in our .bazelrc, so roughly

bazel build --cxxopt=-std=c++17 --host_cxxopt=-std=c++17  --define=use_fast_cpp_protos=true  --define=allow_oversize_protos=true  --noincompatible_strict_action_env  --define=use_ydf_tensorflow_proto=1 //:ydf_tutorial_model

Of course, this will only compile a library, you'll still need to implement a main() to actually call this library

Bkeinn commented 3 days ago

O.K. thanks I was now able to build it and my main.cpp looks like this:

#include <bits/stdc++.h>
#include "tf_model.h"

namespace ydf = yggdrasil_decision_forests;

int main() {
  auto model =
     ydf::exported_model_tf_model::Load("/home/heimchen/Documents/Programming/CPPModel/tf_model");

  if(model.ok()){
    const std::vector<float> result = model->Predict();
    for(auto v : result){
      std::cout << v << " ";
    }
    std::cout << std::endl;
  }

  return 0;
}

The problem now is, that running the ./main results in this error:

2024-11-26 10:42:05.647879: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: ./tf_model
2024-11-26 10:42:05.647931: I tensorflow/cc/saved_model/loader.cc:466] SavedModel load for tags { serve }; Status: fail: NOT_FOUND: Could not find SavedModel .pb or .pbtxt at supplied export directory path: ./tf_model. Check that the directory exists and that you have the right permissions for accessing it.. Took 61 microseconds.
Error loading model: Could not find SavedModel .pb or .pbtxt at supplied export directory path: ./tf_model. Check that the directory exists and that you have the right permissions for accessing it.
Error: Failed to load model

Which changes to

2024-11-26 10:44:30.262314: I tensorflow/cc/saved_model/reader.cc:83] Reading SavedModel from: ./tf_model
2024-11-26 10:44:30.262413: I tensorflow/cc/saved_model/reader.cc:52] Reading meta graph with tags { serve }
2024-11-26 10:44:30.262424: I tensorflow/cc/saved_model/loader.cc:466] SavedModel load for tags { serve }; Status: fail: NOT_FOUND: Could not find meta graph def matching supplied tags: { serve }. To inspect available tag-sets in the SavedModel, please use the SavedModel CLI: `saved_model_cli`. Took 121 microseconds.
Error loading model: Could not find meta graph def matching supplied tags: { serve }. To inspect available tag-sets in the SavedModel, please use the SavedModel CLI: `saved_model_cli`
Error: Failed to load model

When renaming the header.pb -> saved_model.pb file in the tf_model directory so I guess the code is searching for a file that is not included in the exported model?

I am saving the model from python with:

model.save("tf_model")
with open("tf_model.h", "w") as f:
  f.write(model.to_cpp(key="tf_model"))
Bkeinn commented 4 hours ago

I think in this case the problem is that the given code just tries to run a tensorflow model, as the build in save function, does not work, but the one that converts to tensorflow.

# Does not work
model.save("tf_model")
# Does work
model.to_tensorflow_saved_model("tf_model", mode="tf")

As this now tries to run as tensorflow, it also has problems with the custom functions yggdrasil uses. Is this intended behavior? It is never stated directly in the documentation but I got the impression that could run as standalone.