Open sharathts14 opened 1 year ago
I am not sure if this is bug or a currently requires us to specify column positional integer as column name (string) currently not supported?
I realize the issue in the above code in defining the "initial_types" which is obviously changed after converting to dataframe.
with
initial_types=[("subject", StringTensorType([None, 1])), ("body", StringTensorType([None, 1]))],
the code works fine and the issue filed here stands invalid.
But with my internal example which i cannot share here shows error as below:
RuntimeError: Unable to find column name 'command_normalized' among names ['variable']. Make sure the input names specified with parameter initial_types fits the column names specified in the pipeline to convert. This may happen because a ColumnTransformer follows a transformer without any mapped converter in a pipeline.
and somehow my input variables defined are getting converted to [Variable('variable', 'variable6', type=FloatTensorType(shape=[]))]
in _parse_sklearn_column_transformer of _parse.py
further debug in process
Ah, I see the 'note' section in https://onnx.ai/sklearn-onnx/api_summary.html which exactly mentions the same
@sharathts14 hi there, recently stumbled upon this limitation as well. How do you resolve this and what's your workaround for this limitation?
If the column is referenced with its column name as string, facing a RunTimeError as below
RuntimeError: Unable to find column name 'subject' among names ['input']. Make sure the input names specified with parameter initial_types fits the column names specified in the pipeline to convert. This may happen because a ColumnTransformer follows a transformer without any mapped converter in a pipeline.
with the same example as in https://onnx.ai/sklearn-onnx/auto_examples/plot_tfidfvectorizer.html when the training dataset is converted to a Pandas dataframe and the column transformer is referenced with column name, the above error can be reproduced.
below is the code to reproduce: