onnx / onnxmltools

ONNXMLTools enables conversion of models to ONNX
https://onnx.ai
Apache License 2.0
1.02k stars 183 forks source link

Add support for SparkUDT #407

Open bissont opened 4 years ago

bissont commented 4 years ago

This is a feature request to add support for SparkUDT for conversion getTensorTypeFromSpark.: https://github.com/onnx/onnxmltools/blob/37e51abce5ed417e00c502381d6bb9666ba34ed5/onnxmltools/convert/sparkml/utils.py#L32

Currently, the conversion from a Spark to a tensor types only supports a conversion of Spark Primitive types. Adding support for SparkUDT is useful.

This use-case comes from trying to export the dataframe from this notebook to ONNX: https://docs.microsoft.com/en-us/azure/synapse-analytics/spark/apache-spark-machine-learning-mllib-notebook

The export that results in the error in getTensorTypeFromSpark:

from onnxmltools import convert_sparkml from onnxmltools.convert.sparkml.utils import buildInitialTypesSimple initial_types = buildInitialTypesSimple(train_data_df.drop("label")) onnx_model = convert_sparkml(lrModel,"PySpark model", initial_types)

jiafatom commented 4 years ago

There is no active work on sparkml converter. Contributions are welcome.