Closed mbignotti closed 1 year ago
Ciao Marco, adding a custom op shouldn't be too hard. Unfortunately at the moment we don't provide a specific API for this but I can tell you how you can do it. (we love contributions :smile:).
So first thing you need to add the class of your custom op among the supported ops.
Then you need to write a converter taking as input your operator and returning a pytorch model version. To do this, first you need to register a converter. You can use this as an example where instead of having "SklearnMLPClassifier"
you should put "Sklearn_your_custom_op_class_name"
.
Then you need to provide the actual converter. Given your implementation that is pretty much uses a bunch on np
funtions, should be straightforward to implement it. You can look into other converts implementations to get an idea on how you can do it. For example here.
Let me know if this works for you.
Hi @interesaaat!
Thank you for your reply!
So, if I understand well, the idea is that you take parameters and other relevant attributes (e.g. classes_
) from fitted sklearn
estimators and pass them to a corresponding nn.Module
, that implements the same logic.
However, I'm wondering if, in this case, it's easier to simply create a new nn.Module
class (instead of inheriting from sklearn.base.BaseEstimator
) that internally uses an hummingbird-converted class.
I'm not 100% sure how I would write it, but what I mean is something like this (ignoring the fact that inverse_transform
is not supported):
class PCADetector(torch.nn.Module):
def __init__(self, n_components):
super().__init__()
self.n_components = n_components
def fit(self, X: np.ndarray):
model = PCA(n_components=self.n_components)
model.fit(X)
self.estimator_ = convert(model, backend="pytorch", test_input=X)
def forward(self, x):
x_hat = self.estimator_.inverse_transform(self.estimator_.transform(x))
residuals = x - x_hat
spe = np.sqrt(np.sum(residuals**2, axis=1))
return spe
To give a little bit of context, I'll try to explain why I would like to do.
The final goal is being able to deploy these models without having to deal with python package. The big problem of sklearn
, and python in general for machine learning, is that it's very difficult to deploy custom models when you are not allowed to use docker in production (our case). Custom models might be defined in a project-related repo, and the only way to ship it is to bundle them together with the source code. But this is something we want to avoid, as it might raise other dependencies issue.
Another approach is to compile or convert the model to somthing like onnx or tvm. However, onnx and tvm support is very limited for custom models that are not using deep learning frameworks. That's why I'm trying to understand if Hummingbird could help me.
However, I'm not sure if a composition approach, like the one above, can be adapted to work with subsequent conversions to onnx or similar. On the other side, maybe following the official approach you described to register custom operators in Hummingbird, might be more robust.
What do you think?
Thanks again, Marco.
Yea the approach above won't work because even if you wrap the model as a pytorch module, the internal code still uses numpy so you will need that dependency + python. Hummingbird should be able to help in your use case because as long as you provide your model implementation as tensor operations, using TorchScript or ONNX you can export it without any python or other dependencies.
The only thing I don't like is having to write twice the same model. But, at this point, I guess that this is the only way to go (I've really investigated all possible solutions I could find). Because the only alternative solution I see, is to directly write the code in a compiled language. I'll try to implement it and let you know if that works.
I don't think you will need to write twice the model. Only the inference part (which looks quite easy). For your next model you could just write the fit
method in numpy and the predict
in pytorch so that you don't to replicate any work. Keep us posted!
Closing at the moment. We can reopen in case.
Unfortunately I haven't had the time to work on it. I'll update you as soon as I can. Thanks!
Hi! I'm not sure if it's already been asked, but how difficult would it be to implement converters for custom models and custom transformers?
I often find myself writing wrappers to sklearn models and/or transformers from scratch. But then I loose all the benefits of existing sklearn converters, such as
onnx
orHummingbird
.Here is a simple example of a custom model for anomaly/fault detection
This, of course, cannot be converted with
Hummingbird
. It's necessary to write a custom converter, I guess.Thanks a lot!