Closed fatihtekin closed 7 years ago
I do have a custom estimator that merges rare categorical values into one value as 'RARE' so that I can group all the rare labels as together.
Basically, you want to perform mapping between discrete values - map "popular" values back to themselves, and all "unpopular" values to some default value.
The PMML specification provides the MapValues
element exactly for this purpose:
<MapValues name="simplified_color" defaultValue="rare" outputColumn="outputValue">
<FieldColumnPair field="color" column="inputValue"/>
<InlineTable>
<row>
<inputValue>red</inputValue>
<outputValue>red</outputValue>
</row>
<row>
<inputValue>yellow</inputValue>
<outputValue>yellow</outputValue>
</row>
<row>
<inputValue>green</inputValue>
<outputValue>green</outputValue>
</row>
</InlineTable>
</MapValues>
The above transformation would keep color values "red", "yellow" and "green" as-is, and change all other color values to "rare" (note the MapValues@defaultValue
attribute).
I would like to know if it is possible and how can I add my custom modelconverter as you did for spark standard ml-features.
Here's a code example about generating a MapValues
element-based transformation:
https://github.com/jpmml/jpmml-sparkml/blob/master/src/main/java/org/jpmml/sparkml/feature/VectorIndexerModelConverter.java
The org.dmg.pmml.InlineTable
element is rather tricky to generate in Java, because you need to be working with low-level W3C DOM APIs in some point.
Even if I manage it, I am not sure if jpmml-evaluater will be able to make it run.
JPMML-Evaluator is able to run all PMML documents that conform to PMML 3.X and 4.X specifications.
This is a good suggestion, I will try that. I know it is not relevant but is it normal that some of the categorical columns are missing when I generate pmml model?
Ok, I got it, sorry. The ones not used for mining got cleaned in DataDictionaryCleaner. I will keep this open till I finish my custom converter as I may have more questions.
Some more code examples - here's the LabelEncoder
transformer from Scikit-Learn, which maps category values to category indexes:
https://github.com/jpmml/jpmml-sklearn/blob/master/src/main/java/sklearn/preprocessing/LabelEncoder.java
If your data column contains missing values, and you'd like to map them to the default category (or some special category) as well, then don't forget to specify the MapValues@mapMissingTo
attribute.
Anyway, I would recommend you to take the following steps:
MapValues
element, and replace all the color
field invocations with the simplified_color
field invocations.org.jpmml.evaluator.EvaluationExample
command-line application from the JPMML-Evaluator project. This is the fastest way to ensure that your PMML changes are structurally valid, and produce desired results.org.jpmml.sparkml.FeatureConverter
subclass.Can I assume since I set setHandleInvalid("keep") in StringIndexer, it will already be handled as I use StringIndexer after my RareMerger?
I use StringIndexer after my RareMerger
After RangeMerger
, there will be only "valid and popular" values left - "red", "yellow", "green" and "rare".
There is no need to specify StringIndexer#handleInvalid
property.
I have tried evaulator but i am getting an exception when I try to generate jpmml model.
Exception in thread "main" java.lang.IllegalArgumentException: Expected 7 features, got 6 features at org.jpmml.sparkml.ModelConverter.encodeSchema(ModelConverter.java:147) at org.jpmml.sparkml.ModelConverter.registerModel(ModelConverter.java:161) at org.jpmml.sparkml.ConverterUtil.toPMML(ConverterUtil.java:76) at org.apache.spark.ml.pmml.PmmlTries$.main(PmmlTries.scala:55) at org.apache.spark.ml.pmml.PmmlTries.main(PmmlTries.scala)
val conf = new SparkConf().setAppName("OneHotEncoderExample").setMaster("local[*]")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
var df = sqlContext.createDataFrame(Seq(
//(0, 1),
(1, 3, 1),
(2, 3, 0),
(3, 5, 1),
(4, 5, 1),
(5, 6, 0),
(6, 6, 0),
(7, 6, 0),
(8, 999999, 1)
,(9, 8999, 0)
,(10, 89994343, 1)
)).toDF("id", "category", "label")
val indexer = new StringIndexer().setInputCol("category").setOutputCol("categoryIndex").setHandleInvalid("keep")
val encoder = new OneHotEncoder().setInputCol("categoryIndex").setOutputCol("categoryVec").setDropLast(false)
val assembler = new VectorAssembler().setInputCols(Array("categoryVec")).setOutputCol("features")
val lr = new LinearRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
var pipeline = new Pipeline().setStages(Array(indexer, encoder, assembler, lr))
var model = pipeline.fit(df)
model.transform(df).show(10,false)
var pmml = ConverterUtil.toPMML(df.schema, model)
Exception in thread "main" java.lang.IllegalArgumentException: Expected 7 features, got 6 features
Very interesting - the example Apache Spark ML pipeline appears to generate one extra "shadow" feature. This behavior must be triggered by some configuration option, either by StringIndexer#setHandleInvalid(String)
or OneHotEncoder#setDropLast(boolean)
.
Will investigate. Just to clarify, what is your exact Apache Spark version?
sure, spark is 2.2.0 and scala is 2.11 If I comment setHandleInvalid then it throws below exception. Btw, I need setHandleInvalid.
Using below dependencies "org.jpmml" % "jpmml-sparkml-xgboost" % "1.0-SNAPSHOT" "org.jpmml" % "jpmml-xgboost" % "1.2-SNAPSHOT" "org.jpmml" % "pmml-evaluator" % "1.3.8" "org.jpmml" % "jpmml-sparkml" % "1.2.1"
Exception in thread "main" java.lang.NoSuchMethodError:
org.jpmml.converter.ModelUtil.createMiningSchema(Lorg/jpmml/converter/Schema;)Lorg/dmg/pmml/MiningSchema;
at org.jpmml.sparkml.model.LinearRegressionModelConverter.encodeModel(LinearRegressionModelConverter.java:40)
at org.jpmml.sparkml.model.LinearRegressionModelConverter.encodeModel(LinearRegressionModelConverter.java:30)
at org.jpmml.sparkml.ModelConverter.registerModel(ModelConverter.java:167)
at org.jpmml.sparkml.ConverterUtil.toPMML(ConverterUtil.java:76)
at org.apache.spark.ml.pmml.PmmlTries$.main(PmmlTries.scala:55)
at org.apache.spark.ml.pmml.PmmlTries.main(PmmlTries.scala)
The culprit is StringIndexer#setHandleInvalid("keep")
, which causes a special "catch-all-invalids" feature to be appended to the feature list.
The "keep" invalid feature handler seems to be Apache Spark 2.2.X thing. Earlier Apache Spark versions (eg. 2.0.X and 2.1.X) will not let you use it:
java.lang.IllegalArgumentException: strIdx_b78325d25068 parameter handleInvalid given invalid value keep.
As explained in https://github.com/jpmml/jpmml-sparkml/issues/28#issuecomment-321789670, there is no need to specify invalid value handler after you've explicitly categorized features as "popular" and "rare" using the RangeMerger
transformer.
Anyway, I intend to make the JPMML-SparkML library smarter about the StringIndexer#handleInvalid
property. At minimum, there will be a more relevant and informative exception being thrown.
Exception in thread "main" java.lang.NoSuchMethodError: org.jpmml.converter.ModelUtil.createMiningSchema(Lorg/jpmml/converter/Schema;)Lorg/dmg/pmml/MiningSchema;
You have a classpath conflict - Apache Spark contains JPMML-Model library version 1.2.15, which is "shadowing" the latest JPMML-Model 1.3.X.
Please configure your application classpath as specified in JPMML-SparkML README file: https://github.com/jpmml/jpmml-sparkml#library
Personally, I would suggest deleting the offending JPMML-Model library JAR files from the Apache Spark installation (as detailed in section "Modifying Apache Spark installation").
Exception in thread "main" java.lang.IllegalArgumentException: Expected 7 features, got 6 features
The name of the "catch-all-invalids" pseudo-category is __unknown
. The fix is available in JPMML-SparkML version 1.3.2 (and newer).
In PMML, the corresponding transformation looks like this:
<DerivedField name="handleInvalid(category)" optype="categorical" dataType="string">
<Apply function="if">
<Apply function="isIn">
<FieldRef field="category"/>
<Constant>6</Constant>
<Constant>5</Constant>
<Constant>3</Constant>
<Constant>8999</Constant>
<Constant>89994343</Constant>
<Constant>999999</Constant>
</Apply>
<FieldRef field="category"/>
<Constant>__unknown</Constant>
</Apply>
</DerivedField>
You could use exactly the same pattern for the RangeMerger
transformer - simply replace __unknown
with rare
.
It is awesome how quickly you have added the functionality. Thanks, really appreciate that. I assume your implementation for StringIndexerModelCOnvertor might throw IllegalArgumentException in case setHandleInvalid("skip") is set. I have added my RareMerger as I have realized that I have missed explaining the null value handling as well. Perhaps code speaks better than my words. Please let me know if I need to explain more. I will need to transform all rare labels to 'RARE', popular labels stay as they are, if training has seen null values then they get translated into category level 'NULL' if not they go to __unknown.
Implementation RareMerger.txt
Test Cases (Standard as Spark Test its ML features) RareMergerSuite.txt
Unfortunately, I am getting below exception.
Exception in thread "main" java.lang.NumberFormatException: For input string: "__unknown"
at java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
at java.lang.Long.parseLong(Long.java:589)
at java.lang.Long.parseLong(Long.java:631)
at org.jpmml.evaluator.TypeUtil.parseInteger(TypeUtil.java:121)
at org.jpmml.evaluator.TypeUtil.parse(TypeUtil.java:85)
at org.jpmml.evaluator.TypeUtil.parseOrCast(TypeUtil.java:69)
at org.jpmml.evaluator.FieldValueUtil.create(FieldValueUtil.java:455)
at org.jpmml.evaluator.FieldValueUtil.refine(FieldValueUtil.java:512)
at org.jpmml.evaluator.FieldValueUtil.refine(FieldValueUtil.java:481)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:64)
at org.jpmml.evaluator.ModelEvaluationContext.resolve(ModelEvaluationContext.java:133)
at org.jpmml.evaluator.EvaluationContext.evaluate(EvaluationContext.java:64)
at org.jpmml.evaluator.regression.RegressionModelEvaluator.evaluateRegressionTable(RegressionModelEvaluator.java:317)
at org.jpmml.evaluator.regression.RegressionModelEvaluator.evaluateRegression(RegressionModelEvaluator.java:128)
at org.jpmml.evaluator.regression.RegressionModelEvaluator.evaluate(RegressionModelEvaluator.java:99)
at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:384)
at org.apache.spark.ml.pmml.PmmlTries$.main(PmmlTries.scala:81)
at org.apache.spark.ml.pmml.PmmlTries.main(PmmlTries.scala)
Testing Code
val conf = new SparkConf().setAppName("OneHotEncoderExample").setMaster("local[*]")
val sc = new SparkContext(conf)
val sqlContext = new SQLContext(sc)
var df = sqlContext.createDataFrame(Seq(
(1, 3, 1),
(2, 3, 0),
(3, 5, 1),
(4, 5, 1),
(5, 6, 0),
(6, 6, 0),
(7, 6, 0),
(8, 999999, 1)
,(9, 8999, 0)
,(10, 89994343, 1)
)).toDF("id", "category", "label")
val indexer = new StringIndexer().setInputCol("category").setOutputCol("categoryIndex").setHandleInvalid("keep")
val encoder = new OneHotEncoder().setInputCol("categoryIndex").setOutputCol("categoryVec").setDropLast(false)
val assembler = new VectorAssembler().setInputCols(Array("categoryVec","id")).setOutputCol("features")
val lr = new LinearRegression()
.setMaxIter(10)
.setRegParam(0.3)
.setElasticNetParam(0.8)
var pipeline = new Pipeline().setStages(Array(indexer, encoder, assembler, lr))
var model = pipeline.fit(df)
var pmml = ConverterUtil.toPMML(df.schema, model)
import org.jpmml.evaluator.ModelEvaluatorFactory
val modelEvaluatorFactory = ModelEvaluatorFactory.newInstance
val evaluator = modelEvaluatorFactory.newModelEvaluator(pmml)
println(new String(ConverterUtil.toPMMLByteArray(df.schema, model), "UTF-8"))
import org.dmg.pmml.FieldName
val arguments = new util.LinkedHashMap[FieldName, FieldValue]()
arguments.put(new FieldName("id"), ContinuousValue.create(DataType.INTEGER, 14))
arguments.put(new FieldName("category"), CategoricalValue.create(DataType.INTEGER , 399))
val results = evaluator.evaluate(arguments)
println(results)
I think you need to add "__unknown" to the label list and handle it as another category level.
StringIndexerModel transformer = getTransformer();
Feature feature = encoder.getOnlyFeature(transformer.getInputCol());
List<String> values = new ArrayList<>(Arrays.asList(transformer.labels()));
//TODO below line should be checked and used if handleInvalid 'keep' is chosen
values.add("__unknown");
DataField dataField = encoder.toCategorical(feature.getName(), values);
return Collections.<Feature>singletonList(new CategoricalFeature(encoder, dataField));
I assume your implementation for StringIndexerModelCOnvertor might throw IllegalArgumentException in case setHandleInvalid("skip") is set.
PMML does not have "skip" functionality. If the PMML engine is asked to score a data record, then the scoring either 1) succeeds or 2) fails with some sort of exception. The "skip" option would mean that the PMML engine doesn't succeed or fail - just consumes a data record.
Unfortunately, I am getting below exception. Exception in thread "main" java.lang.NumberFormatException: For input string: "__unknown"
The constant __unknown
is a string value. If you're going to do StringIndexer#setHandleInvalid("keep")
, then you should make sure that your column is of string data type.
It's a Apache Spark design decision (see the source code of the StringIndexer
transformer). A possible workaround would be that __unknown
should be replaced with some other constant for numeric values (eg. -999
).
I will need to transform all rare labels to 'RARE', popular labels stay as they are, if training has seen null values then they get translated into category level 'NULL' if not they go to __unknown.
In the above DerivedField
element, simply specify the mapMissingTo
attribute:
<DerivedField name="handleInvalid(category)" mapMissingTo="NULL" optype="categorical" dataType="string">
</DerivedField>
Actually, I think that the RareMerger
transform can be represented using the standard StringIndexer
transform - no need to extend JPMML-SparkML in any way.
The idea is to manually truncate StringIndexerModel#getLabels()
to the desired length (say, keep the first 30 elements of the array, which represent "popular" categories), and specify StringIndexerModel#setHandleInvalid("keep")
(which then becomes to represent all other "unpopular" categories).
I do have another issue which is jpmml-evaluator is using com.google.guava:guava:20.0 and spark is using 11.0.2. If I use either of them I am either having
Exception in thread "main" java.lang.IllegalAccessError: tried to access method com.google.common.base.Stopwatch.<init>()V from class org.apache.hadoop.mapreduce.lib.input.FileInputFormat
or
Exception in thread "main" java.lang.NoClassDefFoundError: com/google/common/cache/CacheBuilderSpec
at org.jpmml.evaluator.CacheUtil.<clinit>(CacheUtil.java:112)
at org.jpmml.evaluator.ModelEvaluator.<clinit>(ModelEvaluator.java:671)
at org.jpmml.evaluator.ModelEvaluatorFactory.newModelEvaluator(ModelEvaluatorFactory.java:103)
at org.jpmml.evaluator.ModelEvaluatorFactory.newModelEvaluator(ModelEvaluatorFactory.java:66)
at org.apache.spark.ml.pmml.PmmlTries$.main(PmmlTries.scala:60)
at org.apache.spark.ml.pmml.PmmlTries.main(PmmlTries.scala)
Caused by: java.lang.ClassNotFoundException: com.google.common.cache.CacheBuilderSpec
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:335)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 6 more
I got it sorted by using below in case someone else also needs below is added to the build.sbt file
assemblyShadeRules in assembly := Seq(
ShadeRule.rename("com.google.guava**" -> "shadeio.@1").inAll
)
When I use below RareMergerModelConverter, I am getting exception at StringIndexerModelConverter. Could you tell me what I am doing wrong?
Exception in thread "main" java.lang.IllegalArgumentException: categoryMerge
at org.jpmml.converter.PMMLEncoder.toCategorical(PMMLEncoder.java:145)
at org.jpmml.sparkml.feature.StringIndexerModelConverter.encodeFeatures(StringIndexerModelConverter.java:54)
at org.jpmml.sparkml.FeatureConverter.registerFeatures(FeatureConverter.java:47)
at org.jpmml.sparkml.ConverterUtil.toPMML(ConverterUtil.java:75)
Code
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import org.dmg.pmml.*;
import org.jpmml.converter.ContinuousFeature;
import org.jpmml.converter.DOMUtil;
import org.jpmml.converter.Feature;
import org.jpmml.sparkml.FeatureConverter;
import org.jpmml.sparkml.SparkMLEncoder;
import org.apache.spark.ml.pmml.RareMergerModel;
import javax.xml.parsers.DocumentBuilder;
public class RareMergerModelConverter extends FeatureConverter<RareMergerModel> {
public RareMergerModelConverter(RareMergerModel transformer){
super(transformer);
}
@Override
public List<Feature> encodeFeatures(SparkMLEncoder encoder){
RareMergerModel transformer = getTransformer();
Feature feature = encoder.getOnlyFeature(transformer.getInputCol());
List<String> columns = Arrays.asList("inputValue", "outputValue");
InlineTable inlineTable = new InlineTable();
DocumentBuilder documentBuilder = DOMUtil.createDocumentBuilder();
for(String popularLabel : transformer.popularLabels()){
Row row = DOMUtil.createRow(documentBuilder, columns, Arrays.asList(popularLabel, popularLabel));
inlineTable.addRows(row);
}
for(String rareLabel : transformer.rareLabels()){
Row row = DOMUtil.createRow(documentBuilder, columns, Arrays.asList(rareLabel, transformer.rareLabel()));
inlineTable.addRows(row);
}
MapValues mapValues = new MapValues()
.addFieldColumnPairs(new FieldColumnPair(feature.getName(), columns.get(0)))
.setOutputColumn(columns.get(1))
.setInlineTable(inlineTable)
.setDefaultValue("__unknown");
mapValues.setMapMissingTo("__unknown");
if(transformer.isNullLabelAdded()){
mapValues.setMapMissingTo("NULL");
}
DerivedField derivedField = encoder.createDerivedField(formatName(transformer), OpType.CONTINUOUS, DataType.STRING, mapValues);
return Collections.<Feature>singletonList(new ContinuousFeature(encoder, derivedField));
}
}
When I use below RareMergerModelConverter, ...
Your RareMergerModelConverter
class is close to perfect. Looks like you've successfully managed to figure out a great deal of JPMML-SparkML API design and architecture principles on your own.
One issue that I'm seeing with your code is that the operational type of the derived field should probably be OpType.CATEGORICAL
(not OpType.CONTINUOUS
), because string values do not have comparison operations (eg. "<", "<=", ">=", ">") defined for them. Following this thought, the generated feature should be an instance of CategoricalFeature
(not ContinuousFeature
).
... I am getting exception at StringIndexerModelConverter.
The trouble is that JPMML-SparkML assumes that StringIndexerModel
will always be the first transformer in the pipeline (for a particular column). This assumption is hard-coded in the form, that the corresponding column name must resolve to a DataField
element. In your case, StringIndexerModel
is the second transformer in the pipeline (following the RareMergerModel
transformer), and the corresponding column name resolves to a DerivedField
element.
Class StringIndexerModelConverter
should contain a special instanceof
check to handle this situation. Essentially, if feature instanceof CategoricalFeature
evaluates to true, then there is no need to invoke the #toCategorical(...)
logic anymore.
So, what's the solution now?
Your RareMerger
use case is almost completely handled by the StringIndexer#setHandleInvalid("keep")
; the only addition is that you want to be replacing missing values with the __unknown
constant.
I would still advise you to stop pursuing this RareMerger
path, because it adds unnecessary complexity to your project/application. In PMML, you're supposed to use the MiningField@missingValueReplacement
attribute for missing value replacement functionality:
<RegressionModel>
<MiningSchema>
<MiningField name="category" missingValueReplacement="__unknown"/>
</MiningSchema>
</RegressionModel>
You can add/modify/remove PMML elements and attributes using JPMML-Model library (eg. using the very functional and powerful Visitors API). There is no need to change anything in the Apache Spark ML side.
Here's the pattern:
PipelineModel pipelineModel = pipeline.fit(df);
org.dmg.pmml.PMML pmml = ConverterUtil.toPMML(df.schema, pipelineModel);
pmml = performApplicationSpecificCustomizations(pmml); // THIS!
JAXBUtil.marshalPMML(pmml, System.out);
In this fictional performApplicationSpecificCustomizations
utility method you can modify the live org.dmg.pmml.PMML
class model object in any way you want.
You are absolutely right from the maintenance perspective. Unfortunately, I have to group the rare ones and unknown ones separately so just using stringindexer with the limited group as you suggested won't be enough. Another thing is that I will have to write another transformer again so I prefer to write another converter for now. The other thing I do in RareMerger to decide if the label is popular or not is to check the ratio of the label in all non-null values (using threshold variable).
Is it possible for you to support StringIndexerModelConverter, in case there is another transformer/estimator before? So that I don't need to maintain StringIndexerModelConverter. Maybe also good for future once someone else needed for other estimators that needs to run before StringIndexer
Currently, That is what I am doing below in StringIndexer
DataField dataField;
if(feature instanceof CategoricalFeature){
dataField = new DataField(new FieldName(transformer.getInputCol()),
OpType.CATEGORICAL, DataType.STRING);
}else{
dataField = encoder.toCategorical(feature.getName(), labels);
}
That is the latest code of RareMergerModelConverter
import java.util.Arrays;
import java.util.Collections;
import java.util.List;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import org.dmg.pmml.*;
import org.jpmml.converter.CategoricalFeature;
import org.jpmml.converter.DOMUtil;
import org.jpmml.converter.Feature;
import org.jpmml.sparkml.FeatureConverter;
import org.jpmml.sparkml.SparkMLEncoder;
import javax.xml.parsers.DocumentBuilder;
public class RareMergerModelConverter extends FeatureConverter<RareMergerModel> {
public RareMergerModelConverter(RareMergerModel transformer){
super(transformer);
}
@Override
public List<Feature> encodeFeatures(SparkMLEncoder encoder){
RareMergerModel transformer = getTransformer();
Feature feature = encoder.getOnlyFeature(transformer.getInputCol());
List<String> dataFieldLabels = Stream.of(transformer.rareLabels(),
transformer.popularLabels()).flatMap(Stream::of).collect(Collectors.toList());
List<String> columns = Arrays.asList("inputValue", "outputValue");
InlineTable inlineTable = new InlineTable();
DocumentBuilder documentBuilder = DOMUtil.createDocumentBuilder();
for(String popularLabel : transformer.popularLabels()){
Row row = DOMUtil.createRow(documentBuilder, columns, Arrays.asList(popularLabel, popularLabel));
inlineTable.addRows(row);
}
for(String rareLabel : transformer.rareLabels()){
Row row = DOMUtil.createRow(documentBuilder, columns, Arrays.asList(rareLabel, transformer.getRareLabel()));
inlineTable.addRows(row);
}
MapValues mapValues = new MapValues()
.addFieldColumnPairs(new FieldColumnPair(feature.getName(), columns.get(0)))
.setOutputColumn(columns.get(1))
.setInlineTable(inlineTable)
.setDefaultValue("__unknown")
.setMapMissingTo("__unknown");
dataFieldLabels.add("__unknown");
if(transformer.getNullToString() && transformer.isNullLabelAdded()){
mapValues.setMapMissingTo(transformer.getNullLabel());
dataFieldLabels.add(transformer.getNullLabel());
}
DerivedField derivedField = encoder.createDerivedField(formatName(transformer), OpType.CATEGORICAL, DataType.STRING, mapValues);
return Collections.<Feature>singletonList(new CategoricalFeature(encoder, derivedField, dataFieldLabels));
}
}
Hi
I do have a custom estimator that merges rare categorical values into one value as 'RARE' so that I can group all the rare labels as together. I would like to know if it is possible and how can I add my custom modelconverter as you did for spark standard ml-features.
Ti give an example my custom estimator handles rare columns for categorical columns. So, if there are 1000 categories and only 30 of them are used in most of the time the rest 970 columns will be marked as RARE. So in my model I only save the rare labels. If you need I can paste the code itself as well.
Even if I manage it, I am not sure if jpmml-evaluater will be able to make it run.