Closed liupei101 closed 5 years ago
Does sklearn2pmml support for choosing model when the condition is satisfied?
Is your workflow valid Python/Scikit-Learn syntax in the first place?
PMML can represent it using the model segmentation approach: http://dmg.org/pmml/v4-3/MultipleModels.html
In brief, there would be a top-level MiningModel
element, which contains a TreeModel
and a MiningModel
(that's for XGBoost) child elements. Both segments are associated with a predicate which determines if they should be selected or not.
In JPMML-SkLearn/SkLearn2PMML this can be implemented by introducing a custom estimator class.
Pseudo-code about this custom estimator class usage:
pipeline = PMMLPipeline([
("classifier", ModelSelector([
("X['Widths'] >= 20", DecisionTreeClassifier()),
("X['Widths'] < 20", XGBClassifier()),
]))
])
I wonder how you would fit such a workflow? Is the goal to split the training dataset between two child models already during the training?
Thank you very much at first ! I am so sorry for not explaining my problem clearly.
In fact, I want to make a web application for predicting risk for patients. The application should serve for two independent population(such as people with or without X-ray inspection) by using two corresponding predictive models.
So I should follow the logic below(pseudo-code):
if the patient with X-ray inspection:
# trainset: (train_X_with_xray, train_y_with_xray)
# base estimator: XGBoost Classifier
# fitted by training data involving variables related to the result of X-ray inspection.
Model1 = model(...)
# predict
risk = Model1.predict()
else if the patient without X-ray inspection:
# trainset: (train_X_without_xray, train_y_without_xray)
# base estimator: XGBoost Classifier
# fitted by training data not involving variables related to the result of X-ray inspection.
Model2 = model(...)
# predict
risk = Model2.predict()
Now I face the problem that I should use single PMML file to give result after inputting patient's information to PMML, but not use two PMML files(one for patient with X-ray inspection, the other for patient without X-ray inspection) combining with if-else in JavaScript at the front of web to reach my target!
@vruusmann Thanks for your Pseudo-code about this custom estimator class usage, I will get more about ModelSelector
, or can you give some suggestions about the problem I face with for your convenience ?
Thank you very much!
This custom class should actually be named ModelChoice
, because the suffix "Selector" has special meaning in Scikit-Learn already (feature selectors).
So, class ModelChoice
should implement both fit()
and predict()
functionality:
fit()
, every member model is trained using a subset of the training dataset for which the predicate evaluated to True
.predict()
, the prediction is made using the first model for which the predicate evaluated to True
.This solution wouldn't be too difficult to implement, because there is a reusable predicate translator component already available: https://github.com/jpmml/jpmml-sklearn/blob/master/src/main/javacc/predicate.jj
@liupei101 My schedule is pretty tight during the next week. If you want to speed things up, then you could prototype the Python side of ModelChoice
class yourself.
Reopening, because this is an interesting functionality that should be implemented.
Hey! I have the exact same issue, I tried to handle it through preprocessing and Ruleset but couldn't make it work. Any update on this?
Thanks a lot.
Hello, I would be very interested in this feature as well! Thanks and regards.
Hi, Contributors! I have workflow involving sklearn2pmml, which is listed below:
I searched for basic usage of sklearn2pmml, it can convert trained model to pmml. but I don't know how to implement my workflow!
Does sklearn2pmml support for choosing model when the condition is satisfied?
thx!