Closed johnorillo closed 6 years ago
Hi, closing this was able to solve this by including the scaler right after the column transform
pipeline = PMMLPipeline([
("mapper",column_preprocessor),
('scaler',RobustScaler()),
("classifier",clf)
])
was able to solve this by including the scaler right after the column transform
A scaler as a second step in the top-level pipeline would apply to all columns that are coming out of the first DataFrameMapper
step.
You can scale the column in place. Simply append RobustScaler
to the list of transformers for that column:
column_preprocessor = DataFrameMapper([
(["date1","date2"], [DateGap_Custom(function='DateDiff'), RobustScaler()])
])
Hi, thanks for replying,
I tried appending RobustScaler just as you have suggested, at first I was getting an error saying that expected 2D array, got 1D array instead. Same thing happens if I append RobustScaler() to one of the built in transformer ex.
iris_pipeline = PMMLPipeline([ ("mapper", DataFrameMapper([ (["SepalLengthCm", "PetalLengthCm"], [Aggregator(function = "mean"), RobustScaler()]), )), ("classifier", KNeighborsClassifier(n_neighbors=15)) ])
I was able to fix the error though specifically for my custom function by making sure that my returned value is reshaped (ex. result.reshape(-1,1)) . So far everything is working fine. Thanks!
Hi,
I had created a custom transformer which takes two input columns and outputs one column. The custom function I created just simply takes the difference in terms of months of two dates ( of different formats: %Y-%m-%d , %Y/%m/%d). My sample pipeline is shown below:
So far, I was able to get it working and the correct values are reflected in the training instance in the pmml exported model. On the python side when I was doing my experiment, I got a higher accuracy if I did a scaling after the DateGap() transformer.
In this regard, is there a way to pass the output of the DateGap_Custom() to RobustScaler() inside the DataFrameMapper so that robust scaling will be included in the transformation dictionary? ex hypothetical pmml xml:
And in doing so, the KNN weights will instead include the scaled output of my custom transformer DateGap().
Thank you!