pykeio / ort

Fast ML inference & training for Rust with ONNX Runtime
https://ort.pyke.io/
Apache License 2.0
786 stars 91 forks source link

Got different inference results when using ort #219

Closed AspadaX closed 2 months ago

AspadaX commented 2 months ago

Hi team,

Really nice cargo! It has been amazing since I used it.

However, for now, I am trying to rewrite a python service that takes a user input, use embedding models to vectorize it, then return the vectorized text.

The model I am using is bge-m3-onnx. In my python service, it returns perfect result and the vector search was good. However, the result happens to be different when using ort. The vector search becomes less accurate.

After a quick investigation, I found that the onnxruntime used by ort is older than the python package which is 1.17.3. I had tried disabled the graph optimizations but no avail. For now, the difference between the python code and the rust one is that the python code uses ndarray to manipulate data whereas the rust version uses vec. I am not sure if this could affect the results. Perplexing.

Any idea on this subject? Really great project tho!

Best,

decahedron1 commented 2 months ago

How different are the results? ± $1*10^{-4}$ is typical, but if it's higher that's definitely a cause for concern. Double-check to make sure your preprocessing in Rust is the same as in Python; breaking down the preprocessing into steps and comparing the results at each step can help to identify issues.

AspadaX commented 2 months ago

How different are the results? ± 1∗10−4 is typical, but if it's higher that's definitely a cause for concern. Double-check to make sure your preprocessing in Rust is the same as in Python; breaking down the preprocessing into steps and comparing the results at each step can help to identify issues.

Great thanks to your information provided.

I tracked down to the Tokenizers cargo provided by Huggingface. I ended up finding out that the Rust cargo of Tokenizers need to enable "add_special_tokens" options for the tokenization to be functioned the same as the one in python, otherwise it outputs incorrect tokens, therefore, wrong outputs.

Again, many thanks to your cargo. Really amazing!