Open andresovela opened 1 month ago
Hi Andres, thank you. Let me take a look. Bazel is unfortunately not that straightforward, but the project is tied to it due to the Tensorflow dependency. I didn't quite understand the 8-bit bias. Why would be bias be 8-bit?
I wrote you an email about the 8-bit bias issue a few days ago. However, this is not relevant to this issue. I was just trying to get the model to move past that problem to see what else needs patching.
Btw, this is not a suspicion. I confirmed that the symbols being linked come from the tensorflow code:
I compiled the project with debug symbols and I ran the following:
$ objdump -t xcore-opt | rg MultiplyByQuantizedMultiplier
00000000014c25d0 w F .text 0000000000000a1c _ZN6tflite34MultiplyByQuantizedMultiplier4RowsE11int32x4x4_tii
00000000014c9a08 g F .text 0000000000000120 _ZN6tflite29MultiplyByQuantizedMultiplierElii
000000000053fc0b w F .text 0000000000000029 _ZN6tflite43MultiplyByQuantizedMultiplierGreaterThanOneEiii
000000000053fbd4 w F .text 0000000000000037 _ZN6tflite46MultiplyByQuantizedMultiplierSmallerThanOneExpEiii
00000000014c99af g F .text 0000000000000059 _ZN6tflite29MultiplyByQuantizedMultiplierEiii
$ addr2line -e xcore-opt 0x14c99af
/proc/self/cwd/external/org_tensorflow/tensorflow/lite/kernels/internal/common.cc:21
I figured this one out.
I had to add "@tflite_micro//tensorflow/lite/kernels/internal:common.cc",
here:
While it makes sense from a build system point-of-view, I think this is a massive footgun. In my case, I was able to notice that the wrong file was being compiled in because I was trying to add a new function. However, if I had tried to just change an existing function, I'd have wasted even more time chasing myself around the room trying to figure out why my changes are not working.
Right now, it's even hard to tell whether there currently are patches that are not being included in xformer due to the source file not being listed in lib_tflmc.BUILD
, because the program would still compile using the original tensorflow sources as a fallback.
Thank you @andresovela for investigating this. Tensorflow and tflite-micro used to be the same repo, but since they split into two repos, it's difficult to manage this in a clean manner. They both use tflite
namespace and functions with the same name with subtly different functionality.
One potential solution is to rename all tflite
namespaces in the tflite-micro repo to tflite_micro
namespace or enclose the current tflite
namespace in the tflite-micro repo inside a new tflite_micro
namespace. But that would be a change that would touch almost all files in the tflite-micro repo, and I doubt if we can upstream that.
Thoughts?
That's not a bad idea. I'll propose it upstream and see if I can get a discussion going.
https://github.com/xmos/ai_tools/pull/928 adds in the namespace changes
I've spent the last few days trying to integrate some TFLM patches into xformer, but I failed to add some simple changes due to what I now know is duplicated dependencies.
I'll describe how I found out that the xformer build pulls in files from the wrong repository as best as I can:
I wanted to patch in int8 bias support for CONV_2D in TFLM. So I added the following code here (
ai_tools/third_party/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/micro/kernels/conv.cc
):That seemed to work, but it complained about a missing int8_t implementation for
MultiplyByQuantizedMultiplier()
defined here (ai_tools/third_party/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/kernels/internal/common.h
).So I proceeded to add the missing function both to
common.h
andcommon.cc
located inai_tools/third_party/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/kernels/internal/
, but the linker somehow does not see the implementation I just added inai_tools/third_party/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/kernels/internal/common.cc
.After some experimentation, I found out that if I removed my changes to conv.cc (the diff shown above is no longer relevant from now on), I could actually comment out the int32/int64 impls for
MultiplyByQuantizedMultiplier()
incommon.cc
, but leave the declarations incommon.h
untouched, and the program would still build. The linker somehow is able to find the implementations for these functions elsewhere. Note that if I remove the function prototypes fromcommon.h
, the program does not compile anymore:So the conclusion is that the build system is using the function prototypes from
ai_tools/third_party/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/kernels/internal/common.h
, but it is not using the implementations fromai_tools/third_party/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/kernels/internal/common.cc
.After two days of investigation, I found out that the implementations being used are actually from
ai_tools/xformer/bazel-xformer/external/org_tensorflow/tensorflow/lite/kernels/internal/common.cc
. It looks like this is a copy of the tensorflow repo that gets downloaded by Bazel when you build xformer.I found this comment in the BUILD file:
https://github.com/xmos/ai_tools/blob/9b32da35f5fddc1074332d200161874a7ac36b01/xformer/BUILD#L258-L264
All of this is to say that I find this duplication rather unintuitive, and it results in xformer unintentionally being built not with the source code from https://github.com/xmos/lib_tflite_micro, but rather code from https://github.com/tensorflow/tensorflow
I am not familiar with the Bazel build system, so I haven't been able to find a solution for this yet.