larq / compute-engine

Highly optimized inference engine for Binarized Neural Networks
https://docs.larq.dev/compute-engine
Apache License 2.0
240 stars 33 forks source link

Select indirect BGEMM kernels - Benchmarking grouped binary convolutions #711

Closed simonmaurer closed 2 years ago

simonmaurer commented 2 years ago

Given the commits #549, #550, #551 LCE supportes grouped binary convolutions. this is great work as the standard TFLite still does not support the groups argument for inference: https://github.com/tensorflow/tensorflow/issues/40044 I've successfully created models with appropriate channel dimensions, in which the grouped binary convolutions are correctly identified by the LCE Converter.

How can I benchmark this with the lce_benchmark_model binary ? In other words how can we select the indirect_bgemm kernels, as the regular bgemm kernels don't support grouped convolution ?

Additionally there is a flag use_reference_bconv in the LCE Interpreter, but I do not know what this actually means. Assuming if it is set to True the binary bgemm kernels from https://github.com/larq/compute-engine/tree/main/larq_compute_engine/core/bgemm are selected, otherwise the indirect_bgemm from https://github.com/larq/compute-engine/tree/main/larq_compute_engine/core/indirect_bgemm.

Update: the assumption is not correct, as use_reference_bconv is False by default. so use_reference_bconv is explained differently.

Tombana commented 2 years ago

We currently don't have a CLI flag in lce_benchmark_model to choose between these. For internal benchmarks we simply replaced the registration on the following line:

https://github.com/larq/compute-engine/blob/a2611f8e33f5cb9b4d09b7c9aff7053620a24305/larq_compute_engine/tflite/kernels/lce_ops_register.h#L31-L32

with Register_BCONV_2D_OPT_INDIRECT_BGEMM.

I'd welcome a PR to make this into a commandline flag, my suggestion would be:

Note that use_reference_bconv uses core/bconv2d/reference.h which supports 'everything' such as zero-padding, one-padding and groups. The optimized implementations, however, don't support all of those.

simonmaurer commented 2 years ago

@Tombana thanks a lot for pointing me to the right direction. can do a PR and include a filtering of the arguments, so we can parse the flag (as suggested by you) and remove it from argv before passing it to the BenchmarkTfLiteModel as I assume (need to verify though) this will throw an unrecognized argument error

simonmaurer commented 2 years ago

closing issue as it has been solved by #717