siboehm / lleaves

Compiler for LightGBM gradient-boosted trees, based on LLVM. Speeds up prediction by ≥10x.
https://lleaves.readthedocs.io/en/latest/
MIT License
333 stars 28 forks source link

Add debug version info. #32

Closed fuyw closed 1 year ago

fuyw commented 1 year ago

Hello Simon,

I tried to add a version info to the compiled model file. In my application, the compiled model name is always the same, and sometimes I want to check if I am using the latest one. Therefore, I want to add an api to show the current version info.

Here is my solution:

def add_version_info(module, fversion_func_name, fversion_num):
    """
    Add version info for the forest for debug.
    """
    fversion_func = ir.Function(
        module,
        ir.FunctionType(DOUBLE, []),
        name=fversion_func_name,
    )
    fversion_block = fversion_func.append_basic_block("version")
    builder = ir.IRBuilder(fversion_block)
    result = builder.fadd(ir.Constant(DOUBLE, fversion_num),
                          ir.Constant(DOUBLE, 0.0))
    builder.ret(result)

I added a add_version_info() function, which could be called after gen_forest(forest, ir, fblocksize, froot_func_name). This provides an api to get the model version info. I am not familiar with the llvmlite grammar, so I simply add a dummy 0 to the version number.

If you think this function is helpful, I can submit a PR later.

siboehm commented 1 year ago

I don't see why this should be integrated into lleaves, it seems extremely specific to your usecase. Why don't these 3 more straightforward ways work for you to store version info with the cache file?

  1. Add version info to filename of the cached file
  2. Append a version integer to the tree function via froot_func_name
  3. Write your add_version_info function in C and link it to the lleaves cached file
fuyw commented 1 year ago

Hello Simon, many thanks for the reply.

I didn't adopt the previous two ways because I want to keep the cached file name and tree function name the same for different models.

I am not sure if I fully understood the third way. Do you mean I wrote a specific add_version_info function in C in my project? Then this version info is independent of the cached file (?), any cached file could run with this function.

Anyway, this is indeed a very specific usecase.