Closed rollingdeep closed 5 years ago
Second, I tried to transform libsvm string column to 2 columns: features, label.
You should split this single libsvm
column into two fatures
and label
column using regular Apache Spark APIs.
A transformer should only act on the features
column (independent variables). This is reflected in the JPMML-SparkML library, where the method org.jpmml.sparkml.FeatureConverter#encodeFeatures(SparkMLEncoder)
only deals with the features part (the label part is handled by o.j.s.ModelConverter
).
But still I don't konw how to write a converter to my transformer.
If you have a standalone features
column, then you probably don't need your custom transformer anymore? Or if you think you do, then perhaps you could use standard tools like org.apache.spark.ml.feature.VectorSlicer
or rg.apache.spark.ml.feature.VectorAssembler
instead.
Code-wise, the JPMML-SparkML implementation of your custom transformer should take inspiration from o.j.s.feature.VectorSlicerConverter
and/or o.j.s.feature.VectorAssemblerConverter
classes.
Yeah, I take inspiration from the classes you have given. I have finished the converter and made a jar and py-transfomers using in pyspark. I don't know it right or wrong. Then, I will make some test. Thank you! Your work is awesome!
And I push you a project
It may an unsolving problem in jpmml-sparkml. I knew a method of exploding vector to pieces(f1,f2, ..., fn)to do it. I think it was indeed wasted. I'm trying to write transformer and converter to fix it. First, I want read libsvm format data from hive. I have 3 columns:id,libsvm, dt. Second, I tried to transform libsvm string column to 2 columns: features, label. Third, I trained pipeline model with the transfomer and logistic regression model. and succeeded! But fail to export pmml file. I have read your source code, and know that I need write a converter and add a config in sparkml2pmml.properties. But still I don't konw how to write a converter to my transformer. If you have no time to fix, can you give me some instruction to implement it?
Hope for your reply and contact emal rollingdeep@yeah.net if this is private for you.