Closed liuhuanshuo closed 2 years ago
I am concerned about the issue of a new (https://github.com/jpmml/sklearn2pmml/issues/357)
I used pipeline_test._final_estimator.n_outputs_ = 1
instead of pipeline_test.target_fields = ["my_single_target"]
as you replied in the post.
Then save pmml again and use JPMML-Evaluator-Python to read the model for prediction
Now, instead of prompting the previous error, it prints another error
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
~/.local/lib/python3.7/site-packages/jpmml_evaluator/__init__.py in evaluateAll(self, arguments_df, nan_as_missing)
128 try:
--> 129 result_records = self.backend.staticInvoke("org.jpmml.evaluator.python.PythonUtil", "evaluateAll", self.javaEvaluator, argument_records)
130 except Exception as e:
~/.local/lib/python3.7/site-packages/jpmml_evaluator/py4j.py in staticInvoke(self, className, methodName, *args)
24 javaMember = javaClass.__getattr__(methodName)
---> 25 return javaMember(*args)
26
~/.local/lib/python3.7/site-packages/py4j/java_gateway.py in __call__(self, *args)
1322 return_value = get_return_value(
-> 1323 answer, self.gateway_client, self.target_id, self.name)
1324
~/.local/lib/python3.7/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:
Py4JJavaError: An error occurred while calling z:org.jpmml.evaluator.python.PythonUtil.evaluateAll.
: java.lang.IllegalArgumentException: 2.02-20-92
at org.jpmml.model.temporals.DateTimeUtil.parseDate(DateTimeUtil.java:21)
at org.jpmml.evaluator.TypeUtil.parse(TypeUtil.java:90)
at org.jpmml.evaluator.TypeUtil.parseOrCast(TypeUtil.java:66)
at org.jpmml.evaluator.ScalarValue.<init>(ScalarValue.java:33)
at org.jpmml.evaluator.DiscreteValue.<init>(DiscreteValue.java:30)
at org.jpmml.evaluator.OrdinalValue.<init>(OrdinalValue.java:38)
at org.jpmml.evaluator.OrdinalValue.create(OrdinalValue.java:122)
at org.jpmml.evaluator.FieldValue.create(FieldValue.java:364)
at org.jpmml.evaluator.FieldValue.cast(FieldValue.java:109)
at org.jpmml.evaluator.ExpressionUtil.evaluateTypedExpressionContainer(ExpressionUtil.java:72)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:86)
at org.jpmml.evaluator.ModelEvaluationContext.resolve(ModelEvaluationContext.java:100)
at org.jpmml.evaluator.EvaluationContext.evaluate(EvaluationContext.java:94)
at org.jpmml.evaluator.ExpressionUtil.evaluateFieldRef(ExpressionUtil.java:226)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:143)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
at org.jpmml.evaluator.ExpressionUtil.evaluateApply(ExpressionUtil.java:405)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:167)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpressionContainer(ExpressionUtil.java:61)
at org.jpmml.evaluator.ExpressionUtil.evaluateTypedExpressionContainer(ExpressionUtil.java:66)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:86)
at org.jpmml.evaluator.ModelEvaluationContext.resolve(ModelEvaluationContext.java:100)
at org.jpmml.evaluator.EvaluationContext.evaluate(EvaluationContext.java:94)
at org.jpmml.evaluator.ExpressionUtil.evaluateFieldRef(ExpressionUtil.java:226)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:143)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
at org.jpmml.evaluator.ExpressionUtil.evaluateApply(ExpressionUtil.java:405)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:167)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
at org.jpmml.evaluator.ExpressionUtil.evaluateApply(ExpressionUtil.java:345)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:167)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpressionContainer(ExpressionUtil.java:61)
at org.jpmml.evaluator.ExpressionUtil.evaluateTypedExpressionContainer(ExpressionUtil.java:66)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:86)
at org.jpmml.evaluator.ModelEvaluationContext.resolve(ModelEvaluationContext.java:100)
at org.jpmml.evaluator.EvaluationContext.evaluate(EvaluationContext.java:94)
at org.jpmml.evaluator.ModelEvaluationContext.resolve(ModelEvaluationContext.java:142)
at org.jpmml.evaluator.EvaluationContext.evaluate(EvaluationContext.java:94)
at org.jpmml.evaluator.PredicateUtil.evaluateSimplePredicate(PredicateUtil.java:101)
at org.jpmml.evaluator.PredicateUtil.evaluatePredicate(PredicateUtil.java:73)
at org.jpmml.evaluator.PredicateUtil.evaluate(PredicateUtil.java:63)
at org.jpmml.evaluator.PredicateUtil.evaluatePredicateContainer(PredicateUtil.java:53)
at org.jpmml.evaluator.tree.SimpleTreeModelEvaluator.evaluateTree(SimpleTreeModelEvaluator.java:122)
at org.jpmml.evaluator.tree.SimpleTreeModelEvaluator.evaluateAny(SimpleTreeModelEvaluator.java:90)
at org.jpmml.evaluator.tree.SimpleTreeModelEvaluator.evaluateRegression(SimpleTreeModelEvaluator.java:77)
at org.jpmml.evaluator.ModelEvaluator.evaluateInternal(ModelEvaluator.java:443)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateSegmentation(MiningModelEvaluator.java:595)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateRegression(MiningModelEvaluator.java:231)
at org.jpmml.evaluator.ModelEvaluator.evaluateInternal(ModelEvaluator.java:443)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateInternal(MiningModelEvaluator.java:224)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateSegmentation(MiningModelEvaluator.java:595)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateClassification(MiningModelEvaluator.java:303)
at org.jpmml.evaluator.ModelEvaluator.evaluateInternal(ModelEvaluator.java:446)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateInternal(MiningModelEvaluator.java:224)
at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:300)
at org.jpmml.evaluator.python.PythonUtil.evaluate(PythonUtil.java:92)
at org.jpmml.evaluator.python.PythonUtil.evaluateAll(PythonUtil.java:58)
at org.jpmml.evaluator.python.PythonUtil.evaluateAll(PythonUtil.java:48)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.time.format.DateTimeParseException: Text '2.02-20-92' could not be parsed at index 0
at java.time.format.DateTimeFormatter.parseResolved0(DateTimeFormatter.java:1949)
at java.time.format.DateTimeFormatter.parse(DateTimeFormatter.java:1851)
at java.time.LocalDate.parse(LocalDate.java:400)
at java.time.LocalDate.parse(LocalDate.java:385)
at org.jpmml.model.temporals.Date.parse(Date.java:86)
at org.jpmml.model.temporals.DateTimeUtil.parseDate(DateTimeUtil.java:19)
... 70 more
During handling of the above exception, another exception occurred:
JavaError Traceback (most recent call last)
<ipython-input-201-5a1bd5bd787f> in <module>
----> 1 evaluator.evaluateAll(x_oot_1)
~/.local/lib/python3.7/site-packages/jpmml_evaluator/__init__.py in evaluateAll(self, arguments_df, nan_as_missing)
129 result_records = self.backend.staticInvoke("org.jpmml.evaluator.python.PythonUtil", "evaluateAll", self.javaEvaluator, argument_records)
130 except Exception as e:
--> 131 raise self.backend.toJavaError(e)
132 result_records = self.backend.loads(result_records)
133 return DataFrame.from_records(result_records)
JavaError: java.lang.IllegalArgumentException: 2.02-20-92
I probably know what this error means, presumably there is a problem with the string conversion? Could it be something wrong with the following code?Because 2.02-20-92
looks like 2022092x
def make_modify_date_pipeline():
return make_pipeline(ExpressionTransformer("X[0][:4] + '-' + X[0][4:6] + '-' + X[0][6:8] if len(X[0]) > 0 and X[0][0:8] < '20221230' else '2022-12-30'"), CastTransformer(dtype = "datetime64[D]"), DaysSinceYearTransformer(year = 2022))
def make_day_id_pipeline():
return make_pipeline(ExpressionTransformer("X[1][:4] + '-'+ X[1][4:6] + '-' + X[1][6:8]"), CastTransformer(dtype = "datetime64[D]"), DaysSinceYearTransformer(year = 2022))
def make_feature_union():
return FeatureUnion([
("modify_date", make_modify_date_pipeline()),
("day_id", make_day_id_pipeline())])
But I need to emphasize that the above custom functions work well on pipeline, my pipeline is completely correct and it predicts the correct result.
It seems to be back to the previous problem "My pipeline works fine, I just converted the pipeline to a pmml file and it doesn't work!"
So I don't know whether this is the problem of sklearn2pmml or JPMML-Evaluator-Python. Could you please help me to study it
Your PMML declares that all 60 input fields are of double
data type. The problem is that there is no implicit (ie. automatic) conversion possible from double
value space to date
(or datetime
) value space.
You have to re-declare the relevant input fields so that implicit value conversion would be possible. Alternatively, you may implement custom conversion using some DerivedField
element-based business logic.
TLDR: You cannot represent (prospective-) date
(or datetime
) values as Python's float
or numpy.float64
values. You should convert them to int
or numpy.int64
values!
JavaError: java.lang.IllegalArgumentException: 2.02-20-92
Your input values are something like 20221031
. If you store this value as double, it becomes 2.0221031E7
.
Do you now see where this 2.02
prefix came from?
I am concerned about the issue of a new (https://github.com/jpmml/sklearn2pmml/issues/357)
No, this issue is totally unrelated to that.
Your pipeline works in Python, because Python performs very liberal type casts. Your pipeline would not work in any strict and statically typed programming language (such as PMML), because the necessary type casts could possible add or remove precision pretty much randomly.
In other words, this is legal in Python, but not in other languages:
# A float magically becomes a date, WTAF?
day_id = asdate(2.0221031E7)
The SkLearn2PMML package provides so-called domain decorator classes (inside the sklearn2pmml.decoration
module) for pre-declaring input field type information.
The following might help:
mapper = DataFrameMapper([
# THIS: First specify 'modify_date', then specify 'day_id'
(['modify_date','day_id'], [MultiDomain(ContinuousDomain(dtype = numpy.int64), DateDomain())]),
(['modify_date','day_id'], [make_feature_union(), ExpressionTransformer("X[1] - X[0]")])
])
In other words, this is legal in Python, but not in other languages:
day_id = asdate(2.0221031E7)
In other words, Python is like Microsoft Excel, which auto-converts everything into a date/datetime.
The following might help:
You should actually combine these two lines into one:
mapper = DataFrameMapper([
(['modify_date','day_id'], [MultiDomain(ContinuousDomain(dtype = numpy.int64), DateDomain()), make_feature_union(), ExpressionTransformer("X[1] - X[0]")]),
])
Your pipeline works in Python, because Python performs very liberal type casts. Your pipeline would not work in any strict and statically typed programming language (such as PMML), because the necessary type casts could possible add or remove precision pretty much randomly.
Thank you very much for your answer. I probably know the reason (although I am not quite clear how to solve it).
As an algorithm engineer, I don't pay much attention to these underlying data structure issues. I learned a lot from your reply.
I hear a lot about Python's dynamic typing, or how not specifying a type can be a disaster, and I think that might be the case.
The following might help:
I will deal with this as you suggested, it seems that all columns like '20200909' need a type designation?
Anyway, I'm going to try it for myself first!
(['modify_date','day_id'], [MultiDomain(ContinuousDomain(dtype = numpy.int64), DateDomain())])
I modified the code as follows. Unfortunately, even the pipeline doesn't work anymore
(['modify_date','day_id'],[MultiDomain(ContinuousDomain(dtype = np.int64)), DateDomain(), make_feature_union(), ExpressionTransformer("X[1] - X[0] if X[1]>X[0] else -1")]),
def make_modify_date_pipeline():
return make_pipeline(ExpressionTransformer("X[0][:4] + '-' + X[0][4:6] + '-' + X[0][6:8] if len(X[0]) > 0 and X[0][0:8] < '20221230' else '2022-12-30'"), CastTransformer(dtype = "datetime64[D]"), DaysSinceYearTransformer(year = 2022))
def make_day_id_pipeline():
return make_pipeline(ExpressionTransformer("X[1][:4] + '-'+ X[1][4:6] + '-' + X[1][6:8]"), CastTransformer(dtype = "datetime64[D]"), DaysSinceYearTransformer(year = 2022))
def make_feature_union():
return FeatureUnion([
("modify_date", make_modify_date_pipeline()),
("day_id", make_day_id_pipeline())])
here is the error code
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-238-2b9604b4aa7d> in <module>
----> 1 pipeline_test.predict_proba(x_oot_1)
~/.local/lib/python3.7/site-packages/sklearn2pmml/pipeline/__init__.py in predict_proba(self, X, **predict_proba_params)
82
83 def predict_proba(self, X, **predict_proba_params):
---> 84 Xt = self._transform(X)
85 return self.steps[-1][-1].predict_proba(Xt, **predict_proba_params)
86
~/.local/lib/python3.7/site-packages/sklearn2pmml/pipeline/__init__.py in _transform(self, X)
74 if hasattr(self, "_iter"):
75 for _, name, transform in self._iter(with_final = False):
---> 76 Xt = transform.transform(Xt)
77 else:
78 for name, transform in self.steps[:-1]:
~/.local/lib/python3.7/site-packages/sklearn_pandas/dataframe_mapper.py in transform(self, X)
217 Xt = self._get_col_subset(X, columns)
218 if transformers is not None:
--> 219 Xt = transformers.transform(Xt)
220 extracted.append(_handle_feature(Xt))
221
~/.local/lib/python3.7/site-packages/sklearn/pipeline.py in _transform(self, X)
553 Xt = X
554 for _, _, transform in self._iter():
--> 555 Xt = transform.transform(Xt)
556 return Xt
557
~/.local/lib/python3.7/site-packages/sklearn2pmml/decoration/__init__.py in transform(self, X)
299 def transform(self, X):
300 rows, columns = X.shape
--> 301 if len(self.domains) != columns:
302 raise ValueError("The number of columns {0} is not equal to the number of domain objects {1}".format(columns, len(self.domains)))
303 if isinstance(X, DataFrame):
TypeError: object of type 'ContinuousDomain' has no len()
The following might help:
You should actually combine these two lines into one:
mapper = DataFrameMapper([ (['modify_date','day_id'], [MultiDomain(ContinuousDomain(dtype = numpy.int64), DateDomain()), make_feature_union(), ExpressionTransformer("X[1] - X[0]")]), ])
I tried to insert code in various places, but nothing worked.
MultiDomain(ContinuousDomain(dtype = np.int64)), DateDomain(),
The following error is always displayed
TypeError: object of type 'ContinuousDomain' has no len()
I think I may have found the problem.
In my opinion, modify_date
and day_id
should not be converted to np.int
, but to string
format because these two columns will be split in the function
I don't know what transformer would convert these two columns to string format, though.
But I tried to format these two columns in the pmml file in the same format as the other columns
<DataField name="modify_date" optype="categorical" dataType="string"/>
<DataField name="day_id" optype="categorical" dataType="string"/>
Now, importing the pmml file for the prediction issues another error!
---------------------------------------------------------------------------
Py4JJavaError Traceback (most recent call last)
~/.local/lib/python3.7/site-packages/jpmml_evaluator/__init__.py in evaluateAll(self, arguments_df, nan_as_missing)
128 try:
--> 129 result_records = self.backend.staticInvoke("org.jpmml.evaluator.python.PythonUtil", "evaluateAll", self.javaEvaluator, argument_records)
130 except Exception as e:
~/.local/lib/python3.7/site-packages/jpmml_evaluator/py4j.py in staticInvoke(self, className, methodName, *args)
24 javaMember = javaClass.__getattr__(methodName)
---> 25 return javaMember(*args)
26
~/.local/lib/python3.7/site-packages/py4j/java_gateway.py in __call__(self, *args)
1322 return_value = get_return_value(
-> 1323 answer, self.gateway_client, self.target_id, self.name)
1324
~/.local/lib/python3.7/site-packages/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name)
327 "An error occurred while calling {0}{1}{2}.\n".
--> 328 format(target_id, ".", name), value)
329 else:
Py4JJavaError: An error occurred while calling z:org.jpmml.evaluator.python.PythonUtil.evaluateAll.
: org.jpmml.evaluator.EvaluationException: Categorical value cannot be used in comparison operations
at org.jpmml.evaluator.CategoricalValue.compareToValue(CategoricalValue.java:47)
at org.jpmml.evaluator.functions.ComparisonFunction.evaluate(ComparisonFunction.java:37)
at org.jpmml.evaluator.functions.BinaryFunction.evaluate(BinaryFunction.java:43)
at org.jpmml.evaluator.ExpressionUtil.evaluateFunction(ExpressionUtil.java:463)
at org.jpmml.evaluator.ExpressionUtil.evaluateApply(ExpressionUtil.java:426)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:167)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
at org.jpmml.evaluator.ExpressionUtil.evaluateApply(ExpressionUtil.java:405)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:167)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
at org.jpmml.evaluator.ExpressionUtil.evaluateApply(ExpressionUtil.java:345)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:167)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpressionContainer(ExpressionUtil.java:61)
at org.jpmml.evaluator.ExpressionUtil.evaluateTypedExpressionContainer(ExpressionUtil.java:66)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:86)
at org.jpmml.evaluator.ModelEvaluationContext.resolve(ModelEvaluationContext.java:100)
at org.jpmml.evaluator.EvaluationContext.evaluate(EvaluationContext.java:94)
at org.jpmml.evaluator.ExpressionUtil.evaluateFieldRef(ExpressionUtil.java:226)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:143)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpressionContainer(ExpressionUtil.java:61)
at org.jpmml.evaluator.ExpressionUtil.evaluateTypedExpressionContainer(ExpressionUtil.java:66)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:86)
at org.jpmml.evaluator.ModelEvaluationContext.resolve(ModelEvaluationContext.java:100)
at org.jpmml.evaluator.EvaluationContext.evaluate(EvaluationContext.java:94)
at org.jpmml.evaluator.ExpressionUtil.evaluateFieldRef(ExpressionUtil.java:226)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:143)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
at org.jpmml.evaluator.ExpressionUtil.evaluateApply(ExpressionUtil.java:405)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:167)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpressionContainer(ExpressionUtil.java:61)
at org.jpmml.evaluator.ExpressionUtil.evaluateTypedExpressionContainer(ExpressionUtil.java:66)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:86)
at org.jpmml.evaluator.ModelEvaluationContext.resolve(ModelEvaluationContext.java:100)
at org.jpmml.evaluator.EvaluationContext.evaluate(EvaluationContext.java:94)
at org.jpmml.evaluator.ExpressionUtil.evaluateFieldRef(ExpressionUtil.java:226)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:143)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
at org.jpmml.evaluator.ExpressionUtil.evaluateApply(ExpressionUtil.java:405)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:167)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
at org.jpmml.evaluator.ExpressionUtil.evaluateApply(ExpressionUtil.java:345)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpression(ExpressionUtil.java:167)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:129)
at org.jpmml.evaluator.ExpressionUtil.evaluateExpressionContainer(ExpressionUtil.java:61)
at org.jpmml.evaluator.ExpressionUtil.evaluateTypedExpressionContainer(ExpressionUtil.java:66)
at org.jpmml.evaluator.ExpressionUtil.evaluate(ExpressionUtil.java:86)
at org.jpmml.evaluator.ModelEvaluationContext.resolve(ModelEvaluationContext.java:100)
at org.jpmml.evaluator.EvaluationContext.evaluate(EvaluationContext.java:94)
at org.jpmml.evaluator.ModelEvaluationContext.resolve(ModelEvaluationContext.java:142)
at org.jpmml.evaluator.EvaluationContext.evaluate(EvaluationContext.java:94)
at org.jpmml.evaluator.PredicateUtil.evaluateSimplePredicate(PredicateUtil.java:101)
at org.jpmml.evaluator.PredicateUtil.evaluatePredicate(PredicateUtil.java:73)
at org.jpmml.evaluator.PredicateUtil.evaluate(PredicateUtil.java:63)
at org.jpmml.evaluator.PredicateUtil.evaluatePredicateContainer(PredicateUtil.java:53)
at org.jpmml.evaluator.tree.SimpleTreeModelEvaluator.evaluateTree(SimpleTreeModelEvaluator.java:122)
at org.jpmml.evaluator.tree.SimpleTreeModelEvaluator.evaluateAny(SimpleTreeModelEvaluator.java:90)
at org.jpmml.evaluator.tree.SimpleTreeModelEvaluator.evaluateRegression(SimpleTreeModelEvaluator.java:77)
at org.jpmml.evaluator.ModelEvaluator.evaluateInternal(ModelEvaluator.java:443)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateSegmentation(MiningModelEvaluator.java:595)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateRegression(MiningModelEvaluator.java:231)
at org.jpmml.evaluator.ModelEvaluator.evaluateInternal(ModelEvaluator.java:443)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateInternal(MiningModelEvaluator.java:224)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateSegmentation(MiningModelEvaluator.java:595)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateClassification(MiningModelEvaluator.java:303)
at org.jpmml.evaluator.ModelEvaluator.evaluateInternal(ModelEvaluator.java:446)
at org.jpmml.evaluator.mining.MiningModelEvaluator.evaluateInternal(MiningModelEvaluator.java:224)
at org.jpmml.evaluator.ModelEvaluator.evaluate(ModelEvaluator.java:300)
at org.jpmml.evaluator.python.PythonUtil.evaluate(PythonUtil.java:92)
at org.jpmml.evaluator.python.PythonUtil.evaluateAll(PythonUtil.java:58)
at org.jpmml.evaluator.python.PythonUtil.evaluateAll(PythonUtil.java:48)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:374)
at py4j.Gateway.invoke(Gateway.java:282)
at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
at py4j.commands.CallCommand.execute(CallCommand.java:79)
at py4j.GatewayConnection.run(GatewayConnection.java:238)
at java.lang.Thread.run(Thread.java:748)
During handling of the above exception, another exception occurred:
JavaError Traceback (most recent call last)
<ipython-input-292-5a1bd5bd787f> in <module>
----> 1 evaluator.evaluateAll(x_oot_1)
~/.local/lib/python3.7/site-packages/jpmml_evaluator/__init__.py in evaluateAll(self, arguments_df, nan_as_missing)
129 result_records = self.backend.staticInvoke("org.jpmml.evaluator.python.PythonUtil", "evaluateAll", self.javaEvaluator, argument_records)
130 except Exception as e:
--> 131 raise self.backend.toJavaError(e)
132 result_records = self.backend.loads(result_records)
133 return DataFrame.from_records(result_records)
JavaError: org.jpmml.evaluator.EvaluationException: Categorical value cannot be used in comparison operations
Could you tell me how to do it?
I want to repeat my requirements again.
The modify_date
and day_id inputs
are both like '20220909' and '20220101'
I just need to calculate their time difference (modify_date-day_id
)
Of course, there are some other restrictions for modify_date, such as it cannot be empty and cannot be greater than 20221231, which is why the following if code exists
if len(X[0]) > 0 and X[0][0:8] < '20221230' else '2022-12-30'
I really need your help!
The following error is always displayed
TypeError: object of type 'ContinuousDomain' has no len()
The MultiDomain
constructor expects a Python list of child decorators:
https://github.com/jpmml/sklearn2pmml/blob/0.87.0/sklearn2pmml/decoration/__init__.py#L288-L289
So, the correct syntax would be like this (one child decorator per column - one for modify_date
and another for day_id
):
decorator = MultiDomain([ContinuousDomain(), DateDomain()])
org.jpmml.evaluator.EvaluationException: Categorical value cannot be used in comparison operations
We've discussed this situation before - comparing one string with another using comparison operators like <'
, <=
, =>
and >
does not make sense:
my_date = "20221031"
if my_date < "20221101":
print("Date is earlier than 1st of November, 2022")
I remember commenting that I would expect to see a type check error being thrown... I can't find my comment, but this is exactly the kind of exception that I was hoping to see.
The
modify_date
andday_id
inputs are both like '20220909' and '20220101'
They are both strings that match pattern "YYYYMMDD"
. You need to re-format to ISO 8601 date format pattern, which is YYYY-MM-DD
.
We can use ExpressionTransformer
for this:
string_reformatter = ExpressionTransformer("X[0][:4] + '-' + X[0][4:6] + '-' + X[0][6:8]")
However, it is possible that modify_date
is either an empty string, or a date string that is greater than some "upper limit" date.
When working with strings, then you can only implement the first part of the above clause (ie. string is empty/not empty). You cannot do the second part, because the comparison operator <=
does not work with strings.
modify_date_reformatter = ExpressionTransformer("X[0][:4] + '-' + X[0][4:6] + '-' + X[0][6:8] if len(X[0]) > 0 else '2022-12-30'")
day_id_reformatter = ExpressionTransformer("X[0][:4] + '-' + X[0][4:6] + '-' + X[0][6:8]")
After reformatting, you can cast them to date
data type using CastTransformer(dtype = "datetime64[D]")
, and then transform them to a numeric value "number of days since some reference date" using DaysSinceYearTransformer(year = 2022)
.
The final exercise is about sanitizing modify_date
values that are "in the future". This is very simple, because your threshold date is 2022-12-30
, which is known to be 365 days since 2022-01-01. In other words, any pre-processed modify_date
value that is greater than 365, should be capped down to 365.
Doing the final arithmetic:
days_difference = ExpressionTransformer("(X[1] - X[0]) if X[0] <= 365 else (X[1] - 365)")
Can probably be rearranged into:
days_difference = ExpressionTransformer("X[1] - numpy.min(X[0], 365)")
I remember commenting that I would expect to see a type check error being thrown... I can't find my comment, but this is exactly the kind of exception that I was hoping to see.
Thank you very much. I think I understand exactly what you mean
I used the code you provided recently and it works very well on part of the dataset, thank you very much
However, it will also report an error in the case of too much time.
Let me get straight to the point and model the following data
def make_modify_date_pipeline():
return make_pipeline(ExpressionTransformer("X[0][:4] + '-' + X[0][4:6] + '-' + X[0][6:8] if len(X[0]) > 0 else '2022-12-30'"), CastTransformer(dtype = "datetime64[D]"), DaysSinceYearTransformer(year = 2022))
def make_day_id_pipeline():
return make_pipeline(ExpressionTransformer("X[1][:4] + '-' + X[1][4:6] + '-' + X[1][6:8]"), CastTransformer(dtype = "datetime64[D]"), DaysSinceYearTransformer(year = 2022))
def make_feature_union():
return FeatureUnion([
("modify_date", make_modify_date_pipeline()),
("day_id", make_day_id_pipeline())])
mapper_encode = [(['modify_date','day_id'],[make_feature_union(), ExpressionTransformer("(X[1] - X[0]) if (X[0] <= 365 and X[1]>X[0]) else -1")],{'alias':'modify_days'})]
mapper = DataFrameMapper(mapper_encode, input_df=True,df_out=True)
data_test = pd.DataFrame({
'modify_date':['20220626223702','20220629204300','20220602000000'],
'day_id':['20220714','20220715','20220914']
})
Now, with mapper on data_test, it works fine
mapper.fit_transform(data_test)
modify_days
0 18
1 16
2 104
However, if you change a day_id to 2999, you will get an error
data_test_new = pd.DataFrame({
'modify_date':['20220626223702','20220629204300','20220602000000'],
'day_id':['20220714','29991231','20221231']
})
mapper.fit_transform(data_test_new)
here is the error code
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
/data1/anaconda3/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
1978 try:
-> 1979 values, tz_parsed = conversion.datetime_to_datetime64(data)
1980 # If tzaware, these values represent unix timestamps, so we
pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()
TypeError: Unrecognized value type: <class 'str'>
During handling of the above exception, another exception occurred:
OutOfBoundsDatetime Traceback (most recent call last)
<ipython-input-386-02535a430e61> in <module>
----> 1 mapper.fit_transform(data_test_new)
~/.local/lib/python3.7/site-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params)
569 if y is None:
570 # fit method of arity 1 (unsupervised transformation)
--> 571 return self.fit(X, **fit_params).transform(X)
572 else:
573 # fit method of arity 2 (supervised transformation)
~/.local/lib/python3.7/site-packages/sklearn_pandas/dataframe_mapper.py in fit(self, X, y)
167 if transformers is not None:
168 _call_fit(transformers.fit,
--> 169 self._get_col_subset(X, columns), y)
170
171 # handle features not explicitly selected
~/.local/lib/python3.7/site-packages/sklearn_pandas/pipeline.py in _call_fit(fit_method, X, y, **kwargs)
22 """
23 try:
---> 24 return fit_method(X, y, **kwargs)
25 except TypeError:
26 # fit takes only one argument
~/.local/lib/python3.7/site-packages/sklearn_pandas/pipeline.py in fit(self, X, y, **fit_params)
74
75 def fit(self, X, y=None, **fit_params):
---> 76 Xt, fit_params = self._pre_transform(X, y, **fit_params)
77 _call_fit(self.steps[-1][-1].fit, Xt, y, **fit_params)
78 return self
~/.local/lib/python3.7/site-packages/sklearn_pandas/pipeline.py in _pre_transform(self, X, y, **fit_params)
67 if hasattr(transform, "fit_transform"):
68 Xt = _call_fit(transform.fit_transform,
---> 69 Xt, y, **fit_params_steps[name])
70 else:
71 Xt = _call_fit(transform.fit,
~/.local/lib/python3.7/site-packages/sklearn_pandas/pipeline.py in _call_fit(fit_method, X, y, **kwargs)
22 """
23 try:
---> 24 return fit_method(X, y, **kwargs)
25 except TypeError:
26 # fit takes only one argument
~/.local/lib/python3.7/site-packages/sklearn/pipeline.py in fit_transform(self, X, y, **fit_params)
932 sum of n_components (output dimension) over transformers.
933 """
--> 934 results = self._parallel_func(X, y, fit_params, _fit_transform_one)
935 if not results:
936 # All transformers are None
~/.local/lib/python3.7/site-packages/sklearn/pipeline.py in _parallel_func(self, X, y, fit_params, func)
962 message=self._log_message(name, idx, len(transformers)),
963 **fit_params) for idx, (name, transformer,
--> 964 weight) in enumerate(transformers, 1))
965
966 def transform(self, X):
/data1/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
922 self._iterating = self._original_iterator is not None
923
--> 924 while self.dispatch_one_batch(iterator):
925 pass
926
/data1/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in dispatch_one_batch(self, iterator)
757 return False
758 else:
--> 759 self._dispatch(tasks)
760 return True
761
/data1/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in _dispatch(self, batch)
714 with self._lock:
715 job_idx = len(self._jobs)
--> 716 job = self._backend.apply_async(batch, callback=cb)
717 # A job can complete so quickly than its callback is
718 # called before we get here, causing self._jobs to
/data1/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py in apply_async(self, func, callback)
180 def apply_async(self, func, callback=None):
181 """Schedule a func to be run"""
--> 182 result = ImmediateResult(func)
183 if callback:
184 callback(result)
/data1/anaconda3/lib/python3.7/site-packages/joblib/_parallel_backends.py in __init__(self, batch)
547 # Don't delay the application, to avoid keeping the input
548 # arguments in memory
--> 549 self.results = batch()
550
551 def get(self):
/data1/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in __call__(self)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def __len__(self):
/data1/anaconda3/lib/python3.7/site-packages/joblib/parallel.py in <listcomp>(.0)
223 with parallel_backend(self._backend, n_jobs=self._n_jobs):
224 return [func(*args, **kwargs)
--> 225 for func, args, kwargs in self.items]
226
227 def __len__(self):
~/.local/lib/python3.7/site-packages/sklearn/pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)
724 with _print_elapsed_time(message_clsname, message):
725 if hasattr(transformer, 'fit_transform'):
--> 726 res = transformer.fit_transform(X, y, **fit_params)
727 else:
728 res = transformer.fit(X, y, **fit_params).transform(X)
~/.local/lib/python3.7/site-packages/sklearn/pipeline.py in fit_transform(self, X, y, **fit_params)
381 """
382 last_step = self._final_estimator
--> 383 Xt, fit_params = self._fit(X, y, **fit_params)
384 with _print_elapsed_time('Pipeline',
385 self._log_message(len(self.steps) - 1)):
~/.local/lib/python3.7/site-packages/sklearn/pipeline.py in _fit(self, X, y, **fit_params)
311 message_clsname='Pipeline',
312 message=self._log_message(step_idx),
--> 313 **fit_params_steps[name])
314 # Replace the transformer of the step with the fitted
315 # transformer. This is necessary when loading the transformer
/data1/anaconda3/lib/python3.7/site-packages/joblib/memory.py in __call__(self, *args, **kwargs)
353
354 def __call__(self, *args, **kwargs):
--> 355 return self.func(*args, **kwargs)
356
357 def call_and_shelve(self, *args, **kwargs):
~/.local/lib/python3.7/site-packages/sklearn/pipeline.py in _fit_transform_one(transformer, X, y, weight, message_clsname, message, **fit_params)
724 with _print_elapsed_time(message_clsname, message):
725 if hasattr(transformer, 'fit_transform'):
--> 726 res = transformer.fit_transform(X, y, **fit_params)
727 else:
728 res = transformer.fit(X, y, **fit_params).transform(X)
~/.local/lib/python3.7/site-packages/sklearn/base.py in fit_transform(self, X, y, **fit_params)
569 if y is None:
570 # fit method of arity 1 (unsupervised transformation)
--> 571 return self.fit(X, **fit_params).transform(X)
572 else:
573 # fit method of arity 2 (supervised transformation)
~/.local/lib/python3.7/site-packages/sklearn2pmml/preprocessing/__init__.py in transform(self, X)
95
96 def transform(self, X):
---> 97 return cast(X, self.dtype)
98
99 class CutTransformer(BaseEstimator, TransformerMixin):
~/.local/lib/python3.7/site-packages/sklearn2pmml/util/__init__.py in cast(X, dtype)
8 if isinstance(dtype, str) and dtype.startswith("datetime64"):
9 func = lambda x: to_pydatetime(x, dtype)
---> 10 return dt_transform(X, func)
11 else:
12 if not hasattr(X, "astype"):
~/.local/lib/python3.7/site-packages/sklearn2pmml/util/__init__.py in dt_transform(X, func)
58 if len(shape) > 1:
59 X = X.ravel()
---> 60 Xt = func(X)
61 if isinstance(Xt, Index):
62 Xt = Xt.values
~/.local/lib/python3.7/site-packages/sklearn2pmml/util/__init__.py in <lambda>(x)
7 def cast(X, dtype):
8 if isinstance(dtype, str) and dtype.startswith("datetime64"):
----> 9 func = lambda x: to_pydatetime(x, dtype)
10 return dt_transform(X, func)
11 else:
~/.local/lib/python3.7/site-packages/sklearn2pmml/util/__init__.py in to_pydatetime(X, dtype)
66
67 def to_pydatetime(X, dtype):
---> 68 Xt = pandas.to_datetime(X, yearfirst = True, origin = "unix")
69 if hasattr(Xt, "dt"):
70 Xt = Xt.dt
/data1/anaconda3/lib/python3.7/site-packages/pandas/util/_decorators.py in wrapper(*args, **kwargs)
206 else:
207 kwargs[new_arg_name] = new_arg_value
--> 208 return func(*args, **kwargs)
209
210 return wrapper
/data1/anaconda3/lib/python3.7/site-packages/pandas/core/tools/datetimes.py in to_datetime(arg, errors, dayfirst, yearfirst, utc, box, format, exact, unit, infer_datetime_format, origin, cache)
792 result = _convert_and_box_cache(arg, cache_array, box)
793 else:
--> 794 result = convert_listlike(arg, box, format)
795 else:
796 result = convert_listlike(np.array([arg]), box, format)[0]
/data1/anaconda3/lib/python3.7/site-packages/pandas/core/tools/datetimes.py in _convert_listlike_datetimes(arg, box, format, name, tz, unit, errors, infer_datetime_format, dayfirst, yearfirst, exact)
461 errors=errors,
462 require_iso8601=require_iso8601,
--> 463 allow_object=True,
464 )
465
/data1/anaconda3/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
1982 return values.view("i8"), tz_parsed
1983 except (ValueError, TypeError):
-> 1984 raise e
1985
1986 if tz_parsed is not None:
/data1/anaconda3/lib/python3.7/site-packages/pandas/core/arrays/datetimes.py in objects_to_datetime64ns(data, dayfirst, yearfirst, utc, errors, require_iso8601, allow_object)
1973 dayfirst=dayfirst,
1974 yearfirst=yearfirst,
-> 1975 require_iso8601=require_iso8601,
1976 )
1977 except ValueError as e:
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas/_libs/tslib.pyx in pandas._libs.tslib.array_to_datetime()
pandas/_libs/tslibs/np_datetime.pyx in pandas._libs.tslibs.np_datetime.check_dts_bounds()
OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2999-12-31 00:00:00
I definitely know that the error is caused by this 2999, but I don't know how to deal with it.
In fact, I can understand the error and I searched the error code and found many solutions, but they are all based on the pandas function. Based on my previous experience, I don't know whether these methods can be supported or not.
Since no relevant posts have such problems when using sklearn2pmml, I need your help.
I wonder if CastTransformer caused the problem and if CastTransformer has a parameter that can change a value like 2099 to a specified value.
pandas/_libs/tslibs/conversion.pyx in pandas._libs.tslibs.conversion.datetime_to_datetime64()
TypeError: Unrecognized value type: <class 'str'>
This error happens in the Python side, inside the Pandas library. It refuses to accept string
values as datetime_to_datetime64(..)
arguments.
29991231
Does the Pandas parse succeed when you omit this obviously incorrect value element?
Perhaps Pandas also contains some data sanitization code that accepts 20220714
(looks like a reasonable date) but rejects 29991231
(doesn't look like a reasonable date).
Perhaps Pandas would try harder is it was given an ISO 8601-like date string like 2999-12-31
.
I wonder if CastTransformer caused the problem and if CastTransformer has a parameter that can change a value like 2099 to a specified value.
Sanitize both your modify_date
and date_id
values into ISO 8601 date strings YYYY-MM-DD
vefore feeding them to CastTransformer(dtype = "datetime64[D]")
.
If the Pandas library refuses to parse 2999-12-31
, you should look into Pandas source code and possibly open a new issue with the Pandas project.
Write a unit test for all possible combinations that you have tried. Right now you seem to be struggling with code pieces that were working OK before.
Does the Pandas parse succeed when you omit this obviously incorrect value element?
It looks like you can use pandas to convert,Because the following code executes correctly
pd.to_datetime(pd.DataFrame(['20991231'])[0], errors = 'coerce')
--------------
0 2099-12-31
Name: 0, dtype: datetime64[ns]
I think this goes back to the fact that int can't be used,
The code below works fine because I used numpy.array(X[0]).astype('int')<20221230
to convert all dates like 20991231 to 20221230
data_test_new = pd.DataFrame({
'modify_date':['20220626','29991231','20220602'],
'day_id':['20220714','20220715','20220914']
})
def make_modify_date_pipeline():
return make_pipeline(ExpressionTransformer("X[0][:4] + '-' + X[0][4:6] + '-' + X[0][6:8] if (len(X[0]) > 0 and numpy.array(X[0]).astype('int')<20221230) else '2022-12-30'"), CastTransformer(dtype = "datetime64[D]"), DaysSinceYearTransformer(year = 2022))
def make_day_id_pipeline():
return make_pipeline(ExpressionTransformer("X[1][:4] + '-' + X[1][4:6] + '-' + X[1][6:8]"), CastTransformer(dtype = "datetime64[D]"), DaysSinceYearTransformer(year = 2022))
def make_feature_union():
return FeatureUnion([
("modify_date", make_modify_date_pipeline()),
("day_id", make_day_id_pipeline())])
mapper_encode = [(['modify_date','day_id'],[make_feature_union(), ExpressionTransformer("(X[1] - X[0]) if (X[0] <= 365 and X[1]>X[0]) else -1")],{'alias':'modify_days'})]
mapper = DataFrameMapper(mapper_encode, input_df=True,df_out=True)
mapper.fit_transform(data_test_new)
Unfortunately, an error occurred while converting to pmml, prompting
Exception in thread "main" java.lang.IllegalArgumentException: Function 'numpy.array' is not supported
I am about to collapse, I think this is a very simple task, really has been unable to complete!
Actually, my idea is simple.
All I need to do is add a condition somewhere in the code below (which should be the original location) to change 29991231 to 20221230. But no matter how I tried, I couldn't succeed. Even if successful, it cannot be converted to pmml.
I am in the process of converting the company related algorithm model to pmml and I almost crashed in this small place!
def make_modify_date_pipeline():
return make_pipeline(ExpressionTransformer("X[0][:4] + '-' + X[0][4:6] + '-' + X[0][6:8] if len(X[0]) > 0 else '2022-12-30'"), CastTransformer(dtype = "datetime64[D]"), DaysSinceYearTransformer(year = 2022))
I'm using a very stupid method now
Is the use ofExpressionTransformer("X[0] if X[0] != '2999-12-31' else '2022-12-30'")
def make_modify_date_pipeline():
return make_pipeline(ExpressionTransformer("X[0][:4] + '-' + X[0][4:6] + '-' + X[0][6:8] if len(X[0]) > 0 else '2022-12-30'"), ExpressionTransformer("X[0] if X[0] != '2999-12-31' else '2022-12-30'"),CastTransformer(dtype = "datetime64[D]"), DaysSinceYearTransformer(year = 2022))
Now it's finally working
fit_transform no problem!
No problem converting pmml!
However, when invoked, the following error is still displayed
JavaError: java.lang.IllegalArgumentException: 2.02-20-40
I clearly have not according to your instructions, to solve the problem, why is it still like this!
I'm falling apart!
以下可能会有所帮助:
您实际上应该将这两行合二为一:
mapper = DataFrameMapper([ (['modify_date','day_id'], [MultiDomain(ContinuousDomain(dtype = numpy.int64), DateDomain()), make_feature_union(), ExpressionTransformer("X[1] - X[0]")]), ])
This will cause an error, I have upgraded to the latest version
--------------------------------------------------------------------------- TypeError Traceback (most recent call last) Input In [10], in <cell line: 14>() 8 def make_feature_union(): 9 return FeatureUnion([ 10 ("modify_date", make_modify_date_pipeline()), 11 ("day_id", make_day_id_pipeline())]) ---> 14 mapper_encode = [(['modify_date','day_id'],[MultiDomain(ContinuousDomain(dtype = numpy.int64), DateDomain()),make_feature_union(), ExpressionTransformer("(X[1] - X[0]) if (X[0] <= 365 and X[1]>X[0]) else -1")],{'alias':'modify_days'})] 16 mapper = DataFrameMapper(mapper_encode, input_df=True,df_out=True)
TypeError: init() takes 2 positional arguments but 3 were given
I've tried every transformer in sklearn2pmml.decoration.
Anyway, I finally found a method that allowed me to convert pmml files successfully and also work with java calls.
Just add the following code at the beginning
StringNormalizer(function = None)
So that's it
(['modify_date','day_id'],[StringNormalizer(function = None),make_feature_union(), ExpressionTransformer("(X[1] - X[0]) if (X[0] <= 365 and X[1]>X[0]) else -1")],{'alias':'modify_days'}),
I don't know why it works. I've spent so much time on it that I don't have the energy to figure out why it works.
But with the addition of this one piece of code, my system worked.
Anyway, I want to thank you! Thank you for developing such a great package!
It looks like you can use pandas to convert,Because the following code executes correctly
PMML operates similarly to pandas.to_datetime(.., errors = "raise")
. Therefore, it doesn't matter if Pandas is able to do some clever heuristics in errors = "coerce"
mode, because it's inaccessible.
Here's my unit test:
# Fails with pandas._libs.tslibs.np_datetime.OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 2999-12-31 00:00:00 present at position 0
pandas.to_datetime("29991231", errors = "raise")
# Succeeds, kind of. The result is NaT
pandas.to_datetime("29991231", errors = "coerce")
Unfortunately, an error occurred while converting to pmml, prompting
Exception in thread "main" java.lang.IllegalArgumentException: Function 'numpy.array' is not supported
It's impossible to use inline cast functions such as builtins.int(..)
and builtins.str(..)
inside ExpressionTransformer
expression due to https://github.com/jpmml/jpmml-python/issues/20.
That's a clever "hack", trying to replace int(..)
with numpy.array(..).astype(int)
, but it runs into exactly the same technical limitation - this function cannot be expressed without creating a standalone DerivedField
element (which isn't currently supported).
The inline cast is blocked because of this: http://mantis.dmg.org/view.php?id=169
This will cause an error, I have upgraded to the latest version
mapper_encode = [(['modify_date','day_id'],[MultiDomain(ContinuousDomain(dtype = numpy.int64), DateDomain()),make_feature_union(), ExpressionTransformer("(X[1] - X[0]) if (X[0] <= 365 and X[1]>X[0]) else -1")],{'alias':'modify_days'})]
Did you see https://github.com/jpmml/jpmml-evaluator-python/issues/16#issuecomment-1297504622?
I told you that the MultiDomain
constructor takes a single argument, which is a list of child decorators.
You're passing two child decorators, without wrapping them into a list. Of course it won't work.
Just add the following code at the beginning
StringNormalizer(function = None)
You could use CastTransformer(dtype = str)
with exactly the same effect (convert any value to string, aka "format as string").
However, when invoked, the following error is still displayed
JavaError: java.lang.IllegalArgumentException: 2.02-20-40
Did you see https://github.com/jpmml/jpmml-evaluator-python/issues/16#issuecomment-1296589624?
If you format float(20220714)
as string, you get 2.0220714E7
(floating-point value, in scientific notation). And the [0:4]
subsrting of this value is 2.02
. Everything works just as expected.
Now, if you format int(20220714)
as string, you get 20220714
(integer value). The [0:4]
substring of it is 2022
.
Marking as "resolved".
The troubled user still doesn't appear to grasp the functional difference between integer and floating-point value spaces (one of them is suitable for emulating dates/datetimes, the other is not), but it's beyond my capacity to provide the necessary education here.
I'm sure life will teach him well!
Hello Villu
I'm sorry that I still need your help to troubleshoot a problem predicted by pmml
Last week, I successfully converted my Python model to pmml.
When I used pypmml to call the prediction, I found that the prediction value was inaccurate. Of course, I followed your instructions and installed JPMML-Evaluator-Python
However, when I used JPMML-Evaluator-Python, it didn't work properly and I just reported an error
Here is my code, written according to the readme prompt
Here is the error code
I tried to analyze the problem by myself, and it seemed that the data format was wrong
However, none of the columns in my input need date format, nor does it need date format itself. I used pipepline before to predict with the same data is OK (I don't know if you still remember, Detailed requirements I mentioned in [sklearn2pmml # 356] (https://github.com/jpmml/sklearn2pmml/issues/356))
I also checked my pmml file and it looks correct as well, none of the 60 features required are date columns
So I can't tell what the problem is.
The only thing I can think of is maybe the problem is not in the input but in the output?
Because I am in order to avoid an error (similar to ), added that one line of code
I don't know whether this is the cause of the problem, in a word, could you help me to make a simple analysis