siboehm / lleaves

Compiler for LightGBM gradient-boosted trees, based on LLVM. Speeds up prediction by ≥10x.
https://lleaves.readthedocs.io/en/latest/
MIT License
370 stars 31 forks source link

When compiling with cpp using gcc, fail to find reference to `forest_root` #41

Closed SunHaoOne closed 1 year ago

SunHaoOne commented 1 year ago

Hello, thank you very much for your code, it has shown good performance in testing with Python. I'm sorry I wrote the issue in an existing question, then I reorganised my thoughts and tried some of your suggestions here. My idea is to compile the model trained in Python into an ELF binary file, and then the inference cpp code takes a line of data as input and outputs a number. When using c_bench.cpp, many libraries such as <cnpy.h>,<benchmark/benchmark.h>that are not currently needed are dependent, and due to network reasons, it is difficult to download the dataset, so the function has not been tested yet. (I will try to solve this dataset problem and try the c_bench example compilelater, But I may not have the ability to simplify this code, remove irrelevant header files, similar to the original LightGBM code mentioned in the 5th part below.).

Here are the steps I have taken so far:

1. Generate the cache file by python

import lightgbm
from lleaves import lleaves
import pandas as pd
alldata = pd.read_csv('all_features_data.csv')
X = Xs.iloc[[0]]
llvm_model = lleaves.Model(model_file="data_model_30.txt")
llvm_model.compile(cache='./lleaves.so')
print(llvm_model.predict(X))

Then we will get a lleaves.so file here.

2. Interface with cpp

There are the files in the directory:

xxx@xxx:~/Desktop/LightGBM/src/lleaves$ ls
c_bench.cpp  c_bench.h  interface.cpp  lleaves.so

In the interface.cpp, this is the main function to interface.

#include "c_bench.h"
#include <vector>
#include <iostream>

int main()
{
    std::vector<double> data = {8.81351540e+00, -2.74901880e-01, -4.78453119e-02, 2.25956985e+01,
                                -2.75495538e-01, -9.12007856e-02, -4.78453119e-02, 1.88485949e+00,
                                1.88485949e+00, 1.64226175e-03, 1.64226175e-03};

    double result;
    forest_root(data.data(), &result, 0, 1);
    std::cout << "Result: " << result << std::endl;
    return 0;
}

Since I am not very familiar with the knowledge of this binary file, I am not sure if it is correct to modify c_bench.cpp in this way, so I keep the original entry function bm_lleaves.

#include "c_bench.h"
#include <algorithm>
#include <cstdlib>

static void bm_lleaves()
{

  // predict over the whole input array
  forest_root(loaded_data, out, (int)0, (int)n_preds);
}

I am not sure how to add extern C to the c_bench.cppcode. Here is one of my attempts.

#include "c_bench.h"
#include <algorithm>
#include <cstdlib>
#include <iostream>

extern "C"
{
  // predict over the whole input array
  void forest_root(loaded_data, out, (int)0, (int)n_preds);
}

By the way, I did not modify the c_bench.h file. It already has extern C here.

//
// Created by simon on 25.07.21.
//

#ifndef C_BENCH_LLVM_H
#define C_BENCH_LLVM_H

extern "C"
{
    void forest_root(double *, double *, int, int);
}

#endif // C_BENCH_LLVM_H

3. Compile with gcc

xxx@xxx:~/Desktop/LightGBM/src/lleaves$ g++ -o interface interface.cpp c_bench.h -L lleaves.so
/usr/bin/ld: /tmp/ccQnP3nl.o: in function `main':
interface.cpp:(.text+0x8b): undefined reference to `forest_root'
collect2: error: ld returned 1 exit status

4. objdump check

After checking the ELF, I can find the forest_root.

shy@shy-Precision-3650-Tower:~/Desktop/LightGBM/src/lleaves$ objdump lleaves.so -t

lleaves.so:     elf64-x86-64

SYMBOL TABLE:
0000000000000000 l    df *ABS*  0000000000000000 <string>
0000000000000000 l    d  .rodata.cst8   0000000000000000 .rodata.cst8
0000000000000000         *UND*  0000000000000000 _GLOBAL_OFFSET_TABLE_
0000000000000000 g     F .text  00000000002b5fb2 forest_root

5. Original LightGBM function

In the original LightGBM code, there are many dynamic libraries and header files that are dependent. However, in fact, not so many things are used during inference, so I consider using your code. My idea is to use the compiled model in your code for forward propagation, and remove irrelevant cnpy and benchmark header files. If possible, I hope you can provide a simple example.

#include <LightGBM/c_api.h>
#include <iostream>
#include <fstream>
#include <sstream>
#include <vector>

void predict(std::vector<double> &row)
{

    std::string pred_result = "";
    // int temp;
    int p = 1;
    BoosterHandle handle;

    LGBM_BoosterCreateFromModelfile("../examples/regression/LightGBM_model.txt", &p, &handle);
    void *in_p = static_cast<void *>(row.data());
    std::vector<double> out(1, 0);
    double *out_result = static_cast<double *>(out.data());
    int64_t out_len;
    LGBM_BoosterPredictForMat(handle, in_p, C_API_DTYPE_FLOAT32, 1, 28, 1, C_API_PREDICT_RAW_SCORE, 0, -1, "", &out_len, out_result);
    std::cout << "Row predict: " << out[0] <<  std::endl;
}
SunHaoOne commented 1 year ago

I’m sorry, I’m a beginner and I asked some basic questions. After my attempts, I simplified the following code and hope to help people who are having difficulties like me. Here is the minimal code:

1. Current directory files:

xxx@xxx:~/Desktop/LightGBM/src/lleaves$ ls
c_bench.cpp  c_bench.h  CMakeLists.txt lleaves.o

2. modify the c_bench.cpp

#include "c_bench.h"
#include <vector>
#include <iostream>
int main()
{
  std::vector<double> data = {8.81351540e+00, -2.74901880e-01, -4.78453119e-02, 2.25956985e+01,
                              -2.75495538e-01, -9.12007856e-02, -4.78453119e-02, 1.88485949e+00,
                              1.88485949e+00, 1.64226175e-03, 1.64226175e-03};

  double result;
  forest_root(data.data(), &result, 0, 1);
  std::cout << "Result: " << result << std::endl;
  return 0;
}

2. modify the CMakeLists

cmake_minimum_required(VERSION 3.19)
project(c_bench)

set(CMAKE_CXX_STANDARD 11)

add_executable(c_bench c_bench.cpp)
add_custom_target(run ALL)
add_dependencies(c_bench run)

target_link_libraries(c_bench ${CMAKE_CURRENT_SOURCE_DIR}/lleaves.o)

3. build by cmake

cmake .. && make
./c_bench

4. build by g++

g++ c_bench.cpp lleaves.o -o c_bench
./c_bench
siboehm commented 1 year ago

Nice, I'm glad you got it working! I'll make sure to point others to your reference. If you'd like, you could add it to the docs and I'd add you as a contributor also.