onnx / sklearn-onnx

Convert scikit-learn models and pipelines to ONNX
Apache License 2.0
557 stars 104 forks source link

Support for target based encoders #489

Open siddharth98765 opened 4 years ago

siddharth98765 commented 4 years ago

I was trying to implement a Target based encoder for which I was facing difficulty while implementing a converter

class (CategoricalTransformerOnnx(BaseEstimator, util.TransformerWithTargetMixin): def init(self, cols=None, a=1): self.cols = cols self.a = a self.use_default_cols = cols is None

def fit(self, X, y,**kwargs):
    if self.use_default_cols:
        self.cols = util.get_obj_cols(X)
    else:
        self.cols = util.convert_cols_to_list(self.cols)

    print(self.cols)
    X_temp = self.transform(X, y)
    return X_temp

#Transformer method we wrote for this transformer 
def transform(self, X , y ):
    for column in self.cols:
        global_mean = y.mean()
        temp = y.groupby(X[column].astype(str)).agg(['cumsum', 'cumcount'])
        X[column] = (temp['cumsum'] - y + global_mean) / (temp['cumcount'] + self.a)           
    return X

def cat_shape_calculator(operator):

input = operator.inputs[0]      
N = input.type.shape[0]         # number of observations
C = 22   # dimension of outputs

# new output definition
#operator.inputs[0].type = FloatTensorType([N, C])
operator.outputs[0].type = FloatTensorType([N, C])  

def to_onnx_converter(scope, operator, container): output = operator.outputs[0] # output in ONNX graph op = operator.raw_operator

print(op)      

name = scope.get_unique_operator_name('CategoricalTransformerOnnx')
attrs = {'name': scope.get_unique_operator_name('CategoricalTransformerOnnx')}
#attrs = {}
container.add_node('CategoricalEncoder',operator.input_full_names,
                   operator.output_full_names, op_domain='ai.onnx.ml',
                   **attrs)

update_registered_converter(CategoricalTransformerOnnx, 'CustomTransformer', cat_shape_calculator, to_onnx_converter)

Following error InvalidGraph: [ONNXRuntimeError] : 10 : INVALID_GRAPH : This is an invalid model. Error in Node:CategoricalTransformerOnnx1 : No Op registered for CategoricalEncoder with domain_version of 1 I'm new to onnx platform, so there might some flaw in my code

xadupre commented 4 years ago

ONNX only supports a specific set of operators which you can found at: https://github.com/onnx/onnx/blob/master/docs/Operators.md and https://github.com/onnx/onnx/blob/master/docs/Operators-ml.md. The final graph is a combination of them in a graph. You can look into example: https://github.com/onnx/sklearn-onnx/blob/master/docs/examples/plot_custom_model.py.

siddharth98765 commented 4 years ago

Do we have to implement the entire custom transformer using onnx operators in the converter, from the example it seems like it. Further, how to give multiple inputs to the operator such as X, y where some transformation has to done on columns of X based on the values of y, here X is a dataframe

xadupre commented 4 years ago

If you need a complete new converter, you need to define three elemens : a parser which defines the number of input and output of your model, the shape calculator which defines the size of every output, the converter which converter the model into ONNX. You can find a comple example here: https://github.com/onnx/sklearn-onnx/blob/master/docs/examples/plot_custom_parser.py.

boccaff commented 3 weeks ago

Hi @xadupre ,

Sorry for bumping an old issue, and please let me know if it is better to open a new issue.

I am evaluating target encoded categories for modeling, and having a working conversion to onnx would be nice. I have working prototype on a fork here.

It is based on the implementation for OrdinalEncoder, leveraging the LabelEncoder nodes to perform the substitutions. Can I open a PR for that? I would like to work on what is necessary to get that prototype "ready" .