skorch-dev / skorch

A scikit-learn compatible neural network library that wraps PyTorch
BSD 3-Clause "New" or "Revised" License
5.84k stars 390 forks source link

[Proposal] Integration with Hummingbird #931

Closed mbignotti closed 1 year ago

mbignotti commented 1 year ago

Hummingbird is a library developed by Microsoft to translate classical machine learning models to tensor libraries (including pytorch). It supports many scikit-learn algorithms and meta-estimators, including Pipeline and GridSearchCV. I was thinking that, for deployment purposes, it might be useful to make skorch compatible with Hummingbird. This might open the possibility to translate an entire scikit-learn pipeline that includes a skorch neural net to pytorch, and then maybe to Onnx. Of course, skorch is already using pytorch, so the translation should only involve the sklearn-based steps.

What do you think?

Thanks!

BenjaminBossan commented 1 year ago

Thanks for bringing this to my attention, I wasn't aware of hummingbird yet. It looks very interesting.

Do you have a rough list of steps required to make skorch compatible? This would help me evaluate if it's possibly or not. From my understanding, converting to PyTorch would not really bring a big advantage when using skorch, except if there is a Pipeline of sklearn transformers involved too. Is it also a goal to convert skorch to one of the other backends?

Regarding the general goal of the library, out of curiosity: I would imagine that it's quite hard to translate a majority of sklearn estimators. Just as an example, a user could write a FunctionTransformer with an arbitrary Python function, I don't think it's too uncommon. Are there any plans for that?

mbignotti commented 1 year ago

My main goal is to take advantage of the deployment tools related to the pytorch ecosystem (including ONNX). I'm not really interested in converting skorch to other frameworks (excluding onnx). So yes, the main advantage would be when using skorch inside a Pipeline that we can later deploy to non-python environments.

Hummingbird exposes the convert function, which can be called on a fitted model/pipeline. However, I don't really know how Hummingbird works internally, so I wouldn't know where to begin for integrating the two libraries. Actually, it might even be the case that the integration can only be implemented on the Hummingbird side. In this case, I might have posted on the wrong repo.

About the goal of the library, I think they are only focusing on translating a specific subset of estimators. It's virtually impossible to support everything. Custom estimators, in particular.

Note that, probably, Hummingbird is not the only available solution. Maybe it's possible to directly use scikit-learn/onnx for the deployment problem. However I quite liked Hummingbird's ease of use. After all, you just need to call one function.

BenjaminBossan commented 1 year ago

My main goal is to take advantage of the deployment tools related to the pytorch ecosystem (including ONNX). I'm not really interested in converting skorch to other frameworks (excluding onnx).

Yes, that makes sense. We have looked shortly at ONNX exports in the past. In general, it should work as long as you're only interested in the nn.Module part, which is exposed as net.module_ in skorch. At inference time, the only thing you'd be missing out from skorch is the handling of the data, i.e. creating a data loader and iterating over it. Maybe that's something that Hummingbird can help with, I don't know.

If you open an issue on Hummingbird, feel free to ping me.

BenjaminBossan commented 1 year ago

As there is no follow up, I'll close the issue. If something new comes up, feel free to re-open.