Closed dev0x13 closed 4 years ago
Sorry I didn't see this earlier. Here are benchmarks comparing to gemmlowp and Eigen (see the different sheet tabs). https://docs.google.com/spreadsheets/d/1CB4gsI7pujNRAf5Iz5vuD783QQqO2zOu8up9IpTKdlU/edit#gid=692328710
Thank you, this is very nice of you! These benchmarks looks very promising. The reason because I asked for them is that I have tried to speed up TF Lite (I guess it was 1.15.0 version) by enabling ruy at compile time, but for some reason it made almost no effect. Well, now I am going to make one more attempt on this.
Ruy is the default in TFLite on ARM64 and ARM32. You can still disable it with --define=tflite_with_ruy=false to compare.
Thank you for the clarification, but I build static TF Lite using Makefile instead of Bazel build configs. Seems like because of that ruy is disabled by default except when building for generic aarch64 build (which does not cover neither Android nor iOS): https://github.com/tensorflow/tensorflow/blob/47afb26c2d28991eeebea9c404da499c2a9c8148/tensorflow/lite/tools/make/Makefile#L186
I see. I'm unfamiliar with the Makefile build of TFLite, but from your description it seems like it's not closely tracking the evolution of TFLite. If you need help with that, please consider filing an issue against TFLite, and let me know if you need help finding some relevant TFLite folks to look at it. Alternatively, consider using Bazel. There seems to be docs about that here, https://www.tensorflow.org/lite/guide/build_android
I don't need help with that, but anyway thank you for carrying!
Hi.
The performance comparison above was carried out almost three years ago. A lot has changes in both ruy, tflite and eigen since then.
Is it possible to publish benchmarking code somewhere to allow carrying out new performance comparisons?
Not that much has been changing in ruy in 3 years. Core Ruy authors have moved to other teams within Google, and most of the momentum in TFLite CPU inference has been in a move towards https://github.com/google/XNNPACK as the execution engine. I and @silvasean have both joined https://github.com/openxla/iree . Anyway, that's why no one has bothered to rerun ruy benchmarks.
TFLite CPU inference still supports ruy (as of 2.12.0), and disabling it add considerable about of code bloat to the target binary being built.
We will carry out some benchmarks internally though. I assume I should address further questions to tflite authors then.
Right, best to discuss with TFLite. No one is saying that TFLite's internal kernels would disable ruy and go back to what it was using before (eigen and gemmlowp). What I'm saying is TFLite is relying less and less one those internal kernels and delegating more and more on XNNPACK.
Are there any reliable benchmark results comparing ruy with other GEMM libraries such as gemmlowp and Eigen? I am really interested in this (in the context of Tensorflow Lite performance), but the only tiny piece of information I found so far is a blog post in Tensorflow blog mentioning that TF Lite with ruy enabled outperforming regular TF Lite (
Better CPU performance
section) when inferring on a single CPU core.