jpmml / jpmml-sparkml

Java library and command-line application for converting Apache Spark ML pipelines to PMML
GNU Affero General Public License v3.0
267 stars 80 forks source link

Add VectorToScalar transformer class #52

Open vruusmann opened 6 years ago

vruusmann commented 6 years ago

Use case: A classification model is returning a probability distribution. The data scientist wants to extract the probability of a specific class out of it, and apply further transformations to it ("decision engineering").

The probability distribution is returned as VectorUDT. It is possible to splice it into a one-element VectorUDT using ml.feature.VectorSlicer. However, most common transformer classes (eg. ml.feature.Bucketizer) refuse to accept vector as input.

The VectorToScalar pseudo-transformer class would simply unwrap a single-element vector to a scalar numeric value (ie. int, float or double). The data type of the output column can be manually overriden.