Closed francoisWeber closed 2 years ago
@francoisWeber thanks for the report. This does look like a bug.
I tried to run the python pseudo-training script, but could not manage to get the right versions of some dependencies.
Can you please provide me the .onnx
model plus the input and expected output of the model as a .npz
file as shown in https://github.com/sonos/tract/blob/main/doc/cli-recipe.md#running-a-test-case ? You need to make input names and output names in the network match the tensor names in the io.npz, and you can "test" the test with cargo command line.
Hi @kali
Here is the dummy_tree.onnx.zip file (sorry I had to zip it to make Git happy ...). I also generated a io.npz.zip as described in your tutorial. With these two assets, the output of the command tract -v --input-bundle onnx_runner/assets/io.npz onnx_runner/assets/dummy_tree.onnx -O run --assert-output-bundle onnx_runner/assets/io.npz
is the following:
[2022-06-06T12:01:17.271420671Z INFO tract] Resource usage init: vsz:26910720 rsz:6307840 rszmax:6307840
[2022-06-06T12:01:17.272891005Z INFO tract] Resource usage loaded framework (onnx): vsz:26910720 rsz:6307840 rszmax:6307840
[2022-06-06T12:01:17.280994755Z INFO tract] Resource usage proto model loaded: vsz:26910720 rsz:6307840 rszmax:6307840
[2022-06-06T12:01:17.281151880Z WARN tract_onnx::model] ONNX operator for your model is 15, tract is tested against operator set 9, 10, 11 and 12 only. Your model may still work so this is not a hard fail.
[2022-06-06T12:01:17.283553796Z INFO tract::params] Model Fs("onnx_runner/assets/dummy_tree.onnx") loaded
[2022-06-06T12:01:17.283756838Z INFO tract] Resource usage model loaded: vsz:26910720 rsz:6307840 rszmax:6307840
[2022-06-06T12:01:17.295279796Z INFO tract::params] Will stop at optimize
[2022-06-06T12:01:17.295314755Z INFO tract::params] Running 'analyse'
[2022-06-06T12:01:17.296104046Z INFO tract] Resource usage after analyse: vsz:26910720 rsz:10252288 rszmax:10252288
[2022-06-06T12:01:17.296119963Z INFO tract::params] Running 'incorporate'
[2022-06-06T12:01:17.296178171Z INFO tract] Resource usage after incorporate: vsz:26910720 rsz:10252288 rszmax:10252288
[2022-06-06T12:01:17.296192713Z INFO tract::params] Running 'type'
[2022-06-06T12:01:17.296785005Z INFO tract] Resource usage after type: vsz:26910720 rsz:10252288 rszmax:10252288
[2022-06-06T12:01:17.296802255Z INFO tract::params] Running 'declutter'
[2022-06-06T12:01:17.297420338Z INFO tract] Resource usage after declutter: vsz:26910720 rsz:10252288 rszmax:10252288
[2022-06-06T12:01:17.297449463Z INFO tract::params] Running 'before-optimize'
[2022-06-06T12:01:17.297471546Z INFO tract] Resource usage after before-optimize: vsz:26910720 rsz:10252288 rszmax:10252288
[2022-06-06T12:01:17.297474713Z INFO tract::params] Running 'optimize'
[2022-06-06T12:01:17.297513255Z INFO tract] Resource usage after optimize: vsz:26910720 rsz:10252288 rszmax:10252288
[2022-06-06T12:01:17.297518505Z INFO tract::params] Model ready
[2022-06-06T12:01:17.297547880Z INFO tract] Resource usage model ready: vsz:26910720 rsz:10252288 rszmax:10252288
[2022-06-06T12:01:17.297652588Z INFO tract::tensor] Using fixed input for input called X (1 turn(s))
[2022-06-06T12:01:17.298345046Z INFO tract::utils] Checked output #0, ok.
[2022-06-06T12:01:17.298428796Z ERROR tract] Checking output 1 (expected 1,2,F32 0.78571427, 0.21428572, got 1,2,F32 0.21428572, 0
Caused by:
Mismatch at [0, 0] 0.78571427 != 0.21428572
Notice that $0.21428572 = 1 - 0.78571427$
Hope this will help you understand the bug :)
EDIT: I just tried to mitigate the warning about the ONNX's OpSet by setting my target opset to 12 and the problem remains.
Thanks for taking the time! Having a look at the test case right now.
Thanks. I took a dive inside ONNX runtime code to see what they are doing, and... wow, that's not nice. They are doing a lot of post-processing, specifically tailored for the "binary" case (when there are two categories), while the higher categories number are more or less left alone. ONNX documentation does not describe this, so I don't know what tract is supposed to do. I could try and mimick ONNXRuntime behaviour, but I would prefer to understand where this is coming, because ONNXRuntime code is pretty complicated. Do you know where it comes from ? SciKit maybe ?
Do you know where it comes from ? SciKit maybe ?
If you are talking about my .onnx
, then yes : it comes from a SciKitLearn sklearn.tree.DecisionTreeClassifier
converted to ONNX through sklearn-onnx
which is part of the official ONNX framework: https://onnx.ai/sklearn-onnx/ .
Does the binary-tailored post-processing you're talking about refer to the computation of the complimentary probability in the binary case ? 'cause it's only tractable in the binary case ... just a clue ?
Hey @francoisWeber, I did... something on #734, branch name isfix-732
. Do you want to check that it gives the results as you are expecting them ?
Using your fix-732
it outputs:
[onnx_runner/src/lib.rs:62] &result = [
1,I64 0,
1,2,F32 0.78571427, 0.21428572,
]
[onnx_runner/src/lib.rs:66] &proba = [
0.78571427,
0.21428572,
]
Well done, @kali 😎
@francoisWeber thanks for checking it out. I will have to be cautious in merging, because it's a breaking change... I need to look around/think what other breaking changes I need to pass.
@francoisWeber FYI , fix released as part of 0.17.0
Hi there,
TL;DR: I noticed a wrong computation while outputing a prediction probability vector for a classification tree
Disclaimer: I'm new to Rust. I tried to re-use a snippet of Tract to make inference from on ONNX file thats embeds a simple binary classification tree. The output of the model is:
The model comes from sklearn-onnx. Here is a snippet of code to reproduce such a model:
So as you can see, the prediction
inference_result
has a 2nd dimension containing the prediction probabilities[0.78571427, 0.21428572]
.Now to load it back into a Rust program, I used the following Rust x Tract x ONNX helper:
And I call this
ONNXRunnerRaw
into the following main.rs file:And it outputs the following:
So we retrieved the pythonic probability of 0.214 of beeing in class 1 but the complimentary probability in the rust version is 0.0 !
I think it's a bug the way Tract handles the output of the ONNX model. I hope this feedback is helpful
François Weber