microsoft / hummingbird

Hummingbird compiles trained ML models into tensor computation for faster inference.
MIT License
3.32k stars 274 forks source link

How to compose hummingbird model with other torchscript models #714

Closed jamespinkerton closed 11 months ago

jamespinkerton commented 1 year ago

Hi. I want to build a model in pytorch that calls LGBM at some point inside the model. I need to serialize this model into torchscript so that I can load it in a python-free environment (using libtorch with C++). I was thinking I could convert the LGBM model to hummingbird, and then call it from my pytorch code. The problem is that when I go to serialize the pytorch model with the torch.jit.script function, it throws an error from trying to jit hummingbird.

Any recommendations for what to do here?

Thanks so much

ksaur commented 1 year ago

Hi @jamespinkerton, Can you please post the error here?

We provide the save/load functions which work to serialize any model (including torchscript), but if I understand correctly this won't work for your use case as you need to deserialize it in a non-python environment? I wonder if any of that save/load torch jit code would be at all useful in some other way (the logic translated to some other language)?

interesaaat commented 1 year ago

To add to what Karla said, if you are calling torch.jit.script on an Hummingbird model it won't work. But we provide already that functionality for you: you can compile your model by passing torch.jit as backed and then save it as Karla said and you should be ok (assuming you manually deserialize the model on the C++ side following our loading code as Karla already suggested). Example on how to generate torchscript models from Hummingbird are here.

jamespinkerton commented 1 year ago

This is a great answer, but I don't think it quite answers my question. When I use a modeling package to solve a real-world problem, I usually need to surround it with other things. In this case, I surround it with a torch model. Think of that as preprocessing and postprocessing that occur before and after LGBM.

So fundamentally I need to be able to compose operations. I also need to be able to move the model into torchscript. Based on your answer, I don't see how I could do both at once. If I compose pytorch and hummingbird I can't later move to torchscript. If I compile hummingbird to torchscript first, I don't see how to compose it with other things in pytorch (which themselves have to be put in torchscript too).

interesaaat commented 1 year ago

Ok got it. Have you tried with torch.jit.trace instead of script? That is what we are using internally so maybe it will also work in your case?