Closed banitalebi closed 1 week ago
We can't track a bug statistically. The way to move this forward is to pick one specific input example, run it in the reference implementation and exporting input and output of the exact couterpart of tract in the reference implementation (input between pre-processing and NN model, output between NN model and post-processing). Then we can check how tract disagrees with the npz.
Look at https://github.com/sonos/tract/blob/main/doc/cli-recipe.md#running-a-test-case for how the npz should be generated (be careful about tensor matching the node names and using the right item type).
Before going through these loops, you may want to check your preprocessing. Some image network depends on an extra normalisation step on the input which is sometimes collapsed in the network, sometimes not (then is done as preprocessing). It may be enough to generate this kind of result drift.
@kali is https://github.com/MobilenetModel/troubleshoot-tract helpful?
The Rust implementation results in test_image.jpg
having a closer match to an all-black image, instead of the actual closest match to test_image.jpg
in our dataset.
The Python implementation works as intended, matching test_image.jpg
with the appropriate image file, instead of an all-black image.
Thank you, Kali, for your comment. I reviewed the entire process again and found that the difference in the results is due to slight variations in how images are handled in the Python OpenCV library and the Rust image crate, as well as a bug in the previous version of the MobileNetV3-Inference, which has now been fixed.
Preprocessing can be deceivingly tricky... and my experience with NN is that when NN model inference breaks, it is usually not subtle.
Glad you could finally make it work.
Different performance in applying
tract
as a Rust crate and atrait
as a Python API. Using the same test dataset and the same MobileNetV3.onnx file, the performance of inference using both scenarios is tested and reported in these repositories. Based on the results, each test scenario yields different outcomes. The python tract library is used for inference and accuracy is 82.78% while in the rust side accuracy is 77.28%.