Open pax-k opened 9 months ago
Just chiming in to say I can replicate this on my machine. I hope @skottmckay or someone else from the team can give a few hints to why this might be happening!
My initial thought based on looking at the json output is that this is pretty typical and expected when there are lots of floating-point operations being executed on different platforms.
If you look at the model in Netron there are many MatMul operations. The order of the individual operations will affect the exact value produced by that node. There are many additions and multiplications in a single MatMul. The order of each set of multiplications matter. The order those are added together matter. But there's no rule about the order. i.e. mathematically a x b x c == c x b x a, but you'll get two different results due to how floating-point numbers work.
The low-level instructions used to execute the operations differ by platform/architecture (e.g. various AVX instruction sets on intel/amd, NEON on arm, etc.). These differences accumulate with each node and are magnified by nodes that do a lot of calculation (e.g. MatMul/Conv/Gemm).
When you say the NodeJS output is 'good' but the React Native ones aren't how is that assessed? By the output from using the embeddings in a downstream model? Or is it that the floating-point values differ beyond some expected tolerance?
@skottmckay
We’ve been comparing both the vectors produced by the model directly by loading it with sentence_transformer in python vs. ONNX in node and here we see the exact same results. That’s why we say they look “correct”.
We then compared with ONNX in node vs. react-native and saw widely different results for some inputs.
We understand that floating point calculations can vary slightly on different architectures, but we didn’t expect to see discrepancies this big and seemingly random.
In the comparison chart, we show the Manhattan distance between vectors produced by ONNX node vs ONNX react-native. For most, you can see that the difference is really small (< 0.001), and expected due to differences in floating point math.
However, for a few vectors you will see a huge difference, eg.:
What can be the contributing factor to these difference? The inconsistency creates a blocker for us to put this in production.
@skottmckay
When you say the NodeJS output is 'good' but the React Native ones aren't how is that assessed? By the output from using the embeddings in a downstream model? Or is it that the floating-point values differ beyond some expected tolerance?
You are correct, we mean that the floating-point values differ beyond some expected tolerance.
In this table we compare ONNX embeddings in NodeJS (on the left) with ONNX embeddings in RN (on the right). Given a query and list of sentences, we calculate the cosine similarity and sort the sentences to show the most similar first. The ranked NodeJS sentences on the left feel right, while the RN ones are different. We expected they would be identical, and the table shows how results are affected in RN.
I would suspect that python and nodejs are hitting the same low level code if the results are the same.
As an additional data point, can you run your evaluation on another platform like x64 desktop or an actual iOS device instead of the simulator? Or alternatively you could enable the XNNPACK EP as an alternative implementation of MatMul on CPU (vs. ORT's MLAS library which is used by default)
@skottmckay Thanks for the tip! 🙌
Good news: if i use coreml (and not cpu) as an execution provider for ONNX in react-native iOS, then the ranked sentences match the nodejs ones! I also tested nnapi for Android, but it's as good as cpu.
Hopefully we can roll a text embedding model into production soon without issues.
Thank you for your patience! 🙏🏻
Excellent.
If you ever want to dig really really deeply into it you can do a custom build with a flag to output the result of individual nodes to compare platforms and see how things change throughout the model.
https://onnxruntime.ai/docs/build/inferencing.html#debugnodeinputsoutputs
You could set ORT_DEBUG_NODE_IO_OP_TYPE_FILTER to limit to just the MatMul nodes.
@skottmckay Could a custom build improve performance on iOS?
We did some benchmarks and seems that CoreML is twice as slow than CPU on iOS (but CPU is not precise, so we don't need it).
This is a benchmark for calculating the average inference time on-device (iPhone 12 Pro) with CoreML, using Jina embeddings model:
LOG Model loaded successfully: jina-embeddings-v2-small-en
LOG Runs: 100
LOG Text size (chars): 1099
LOG Download time (s): 4.02
LOG Load time (s): 0.756
LOG Output dims: 512
LOG Average times (s): {
"tokenize": 0.0025299999999999997,
"inference": 0.21671
}
We get ~216ms inference time on iOS + CoreML, compared to:
We also tested other models, like bge-small-en-v1.5
, gte-small
, e5-small-v2
, but jina-embeddings-v2-small-en
was the fastest.
Are you aware of any tweaks we could try to improve inference times when using CoreML?
Thanks!
Depends on the operations in the model. It will be slower if there are unsupported operators breaking up partitions between CoreML and CPU EP. Run with the log severity level set to 'VERBOSE' (0) in the session options and look for 'Node placements' in the output.
FWIW I'm not convinced any particular EP is more 'precise'. I think they're all producing valid output, and saying one is more precise feels a little arbitrary and based on whether you liked the results for a specific query more or less for that EP. i.e. appointing the NodeJS output to be the measure of precision baseline may be flawed. if you run a wide range of queries the 'best' set of results for each may cycle between all the EPs you test with.
Describe the issue
I'm using Xenova/all-MiniLM-L6-v2 to extract embeddings from sentences. Given this inference code, I execute it as is in both NodeJS and React-Native (in RN with a slight difference in how loading the model is made).
The NodeJS outputs are good. The problem is that I get slightly different vector embeddings in React Native, using the same code.
Things to note from the inference code:
To reproduce
Setup iOS:
Setup NodeJS:
I combined and analyzed the 2 JSONs in this project:
a
is from NodeJSb
is from React NativeObservations:
Urgency
It's pretty urgent
Platform
React Native
OS Version
iOS 17.0.1, iPhone 15 Pro Simulator
ONNX Runtime Installation
Released Package
Compiler Version (if 'Built from Source')
No response
Package Name (if 'Released Package')
onnxruntime-react-native
ONNX Runtime Version or Commit ID
1.16.3
ONNX Runtime API
JavaScript
Architecture
ARM64
Execution Provider
Default CPU
Execution Provider Library Version
No response