xmos / ai_tools

AI applications and tools
Other
25 stars 10 forks source link

xformer is built with files from tensorflow instead of lib_tflite_micro #924

Open andresovela opened 1 month ago

andresovela commented 1 month ago

I've spent the last few days trying to integrate some TFLM patches into xformer, but I failed to add some simple changes due to what I now know is duplicated dependencies.

I'll describe how I found out that the xformer build pulls in files from the wrong repository as best as I can:

I wanted to patch in int8 bias support for CONV_2D in TFLM. So I added the following code here (ai_tools/third_party/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/micro/kernels/conv.cc):

+      } else if (bias->type == kTfLiteInt8) {
+        reference_integer_ops::ConvPerChannel(
+            ConvParamsQuantized(params, data),
+            data.per_channel_output_multiplier, data.per_channel_output_shift,
+            tflite::micro::GetTensorShape(input),
+            tflite::micro::GetTensorData<int16_t>(input),
+            tflite::micro::GetTensorShape(filter),
+            tflite::micro::GetTensorData<int8_t>(filter),
+            tflite::micro::GetTensorShape(bias),
+            tflite::micro::GetOptionalTensorData<std::int8_t>(bias),
+            tflite::micro::GetTensorShape(output),
+            tflite::micro::GetTensorData<int16_t>(output));

That seemed to work, but it complained about a missing int8_t implementation for MultiplyByQuantizedMultiplier() defined here (ai_tools/third_party/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/kernels/internal/common.h).

So I proceeded to add the missing function both to common.h and common.cc located in ai_tools/third_party/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/kernels/internal/, but the linker somehow does not see the implementation I just added in ai_tools/third_party/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/kernels/internal/common.cc.

error: undefined reference to 'tflite::MultiplyByQuantizedMultiplier(signed char, int, int)'

After some experimentation, I found out that if I removed my changes to conv.cc (the diff shown above is no longer relevant from now on), I could actually comment out the int32/int64 impls for MultiplyByQuantizedMultiplier() in common.cc, but leave the declarations in common.h untouched, and the program would still build. The linker somehow is able to find the implementations for these functions elsewhere. Note that if I remove the function prototypes from common.h, the program does not compile anymore:

external/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/kernels/internal/reference/integer_ops/transpose_conv.h:209:61: error: 'MultiplyByQuantizedMultiplier' was not declared in this scope

So the conclusion is that the build system is using the function prototypes from ai_tools/third_party/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/kernels/internal/common.h, but it is not using the implementations from ai_tools/third_party/lib_tflite_micro/lib_tflite_micro/submodules/tflite-micro/tensorflow/lite/kernels/internal/common.cc.

After two days of investigation, I found out that the implementations being used are actually from ai_tools/xformer/bazel-xformer/external/org_tensorflow/tensorflow/lite/kernels/internal/common.cc. It looks like this is a copy of the tensorflow repo that gets downloaded by Bazel when you build xformer.

I found this comment in the BUILD file:

https://github.com/xmos/ai_tools/blob/9b32da35f5fddc1074332d200161874a7ac36b01/xformer/BUILD#L258-L264

All of this is to say that I find this duplication rather unintuitive, and it results in xformer unintentionally being built not with the source code from https://github.com/xmos/lib_tflite_micro, but rather code from https://github.com/tensorflow/tensorflow

I am not familiar with the Bazel build system, so I haven't been able to find a solution for this yet.

panickal-xmos commented 1 month ago

Hi Andres, thank you. Let me take a look. Bazel is unfortunately not that straightforward, but the project is tied to it due to the Tensorflow dependency. I didn't quite understand the 8-bit bias. Why would be bias be 8-bit?

andresovela commented 1 month ago

I wrote you an email about the 8-bit bias issue a few days ago. However, this is not relevant to this issue. I was just trying to get the model to move past that problem to see what else needs patching.

andresovela commented 1 month ago

Btw, this is not a suspicion. I confirmed that the symbols being linked come from the tensorflow code:

I compiled the project with debug symbols and I ran the following:

$ objdump -t xcore-opt | rg MultiplyByQuantizedMultiplier
00000000014c25d0  w    F .text  0000000000000a1c              _ZN6tflite34MultiplyByQuantizedMultiplier4RowsE11int32x4x4_tii
00000000014c9a08 g     F .text  0000000000000120              _ZN6tflite29MultiplyByQuantizedMultiplierElii
000000000053fc0b  w    F .text  0000000000000029              _ZN6tflite43MultiplyByQuantizedMultiplierGreaterThanOneEiii
000000000053fbd4  w    F .text  0000000000000037              _ZN6tflite46MultiplyByQuantizedMultiplierSmallerThanOneExpEiii
00000000014c99af g     F .text  0000000000000059              _ZN6tflite29MultiplyByQuantizedMultiplierEiii

$ addr2line -e xcore-opt 0x14c99af
/proc/self/cwd/external/org_tensorflow/tensorflow/lite/kernels/internal/common.cc:21
andresovela commented 1 month ago

I figured this one out.

I had to add "@tflite_micro//tensorflow/lite/kernels/internal:common.cc", here:

https://github.com/xmos/ai_tools/blob/9b32da35f5fddc1074332d200161874a7ac36b01/xformer/lib_tflmc.BUILD#L5-L19

While it makes sense from a build system point-of-view, I think this is a massive footgun. In my case, I was able to notice that the wrong file was being compiled in because I was trying to add a new function. However, if I had tried to just change an existing function, I'd have wasted even more time chasing myself around the room trying to figure out why my changes are not working.

Right now, it's even hard to tell whether there currently are patches that are not being included in xformer due to the source file not being listed in lib_tflmc.BUILD, because the program would still compile using the original tensorflow sources as a fallback.

panickal-xmos commented 1 month ago

Thank you @andresovela for investigating this. Tensorflow and tflite-micro used to be the same repo, but since they split into two repos, it's difficult to manage this in a clean manner. They both use tflite namespace and functions with the same name with subtly different functionality. One potential solution is to rename all tflite namespaces in the tflite-micro repo to tflite_micro namespace or enclose the current tflite namespace in the tflite-micro repo inside a new tflite_micro namespace. But that would be a change that would touch almost all files in the tflite-micro repo, and I doubt if we can upstream that. Thoughts?

andresovela commented 1 month ago

That's not a bad idea. I'll propose it upstream and see if I can get a discussion going.

panickal-xmos commented 1 month ago

https://github.com/xmos/ai_tools/pull/928 adds in the namespace changes