Closed LuanGarrido closed 8 years ago
You need to share with me:
These two CSV files could only be five to ten data records in size - something that qualifies as a reproducible test case.
At the moment, without having seen any data, I'm pretty confident that the problem lies with KNIME - most probably it's simply producing incorrect PMML markup.
Yeah, offcourse =)
The first one contains train and test data.
The second one contains the PMML model created by the Knime workflow showerd before.
Thx for helping me =)
First, I split your tokensTagsTest3.csv
file into input.csv
(columns Col0
, Col1
and Col2
) and expected-output.csv
files (column Col3
). Then, I tested them with each other using the org.jpmml.evaluator.TestingExample
example application:
$ java -cp ~/Workspace/jpmml-evaluator/pmml-evaluator-example/target/example-1.3-SNAPSHOT.jar org.jpmml.evaluator.TestingExample --model decisionTree3.model --input input.csv --expected-output expected-output.csv --separator ";" > diff.txt 2>&1
This testing reveals 60 conflicts.
The first conflict is on third input line:
Conflict{id=2, arguments={Col0=conj-s, Col1=v-fin, Col2=art}, difference=not equal: value differences={Col3=(false, NodeScoreDistribution{result=true, probability_entries=[false=0.25, true=0.75], entityId=350, confidence_entries=[]})}}
Now, if you open your decision tree PMML file in text editor, and execute its algorithm manually, then the winning Node elements are selected in this order: 261
-> 332
-> 350
. And the Node@id=350
element predicts "true", with the associated probability distribution {"true" = 0.75, "false" = 0.25}.
The conclusion is that JPMML-Evaluator is carrying out the evaluation exactly as specified in the PMML file. If you're not happy with these predictions, then you need to look into KNIME. Most likely, KNIME is making some sort of error during PMML generation.
Out of curiosity, I took your training datset tokensTags3.csv
and built a decision tree using R's "rpart" function:
library("rpart")
library("pmml")
tt = read.csv("tokensTags3.csv", sep = ";")
tt.rpart = rpart(Col3 ~ ., data = tt, method = "class", control = rpart.control(maxcompete = 0, maxsurrogate = 0))
saveXML(pmml(tt.rpart, dataset = tt.rpart), "tokensTags3.pmml")
classes = predict(tt.rpart, type = "class")
probabilities = predict(tt.rpart, type = "prob")
result = data.frame("Col3" = classes, "Predicted_Col3" = classes, "Probability_false" = probabilities[, 1], "Probability_true" = probabilities[, 2])
write.csv(result, "expected-output.csv", quote = FALSE, row.names = FALSE)
The testing now passes cleanly:
$ java -cp ~/Workspace/jpmml-evaluator/pmml-evaluator-example/target/example-1.3-SNAPSHOT.jar org.jpmml.evaluator.TestingExample --model tokensTags3.pmml --input tokensTags3.csv --expected-output expected-output.csv
I really appreciate for your help =)
Im going to try another framework in here.
Very thank you my friend
Hello,
I trained two models in Knime: a Neural Network and a Decision Tree.
Im comparing the results in Knime and in Java.
When taking look at the Neural Network, Im getting the same results.
When Decision Tree Model, Im getting all observation going to false.
I tried to read de PMML Model inside Knime and the results are not getting it.
Can you help me?