Time Delay Neural Network support

keithm-xmos commented 3 years ago

Add support for TDNNs, potentially saving much RAM for time-based inputs like speech recognition or audio-based models.

keithm-xmos commented 3 years ago

Investigate using https://github.com/tensorflow/tensorflow/blob/master/tensorflow/lite/micro/kernels/circular_buffer.cc

scott-xmos commented 3 years ago

https://github.com/google-research/google-research/tree/master/kws_streaming

After reviewing the above link, these are the main takeaways:

Tensorflow’s streaming plan is mainly focused on enabling a model to be trained in non-streaming mode and easily switching it to streaming mode.

Tensorflow makes the user responsible for labeling the correct layers as streaming. This step could be automated since the types of layers that should be streamed is defined.

Streaming is accomplished by inserting a ring buffer into streamable operators by using the add_weight method.

Below are the streamable operators and how their ring_buffer_size_in_time_dim is calculated:

Conv1D, Conv2D, DepthwiseConv2D, SeparableConv2D, SeparableConv1D, AveragePooling2D

    For stride=1: ring_buffer_size_in_time_dim = dilation_rate[0] * (kernel_size[0] -1) + 1

    Else: ring_buffer_size_in_time_dim = max(0, dilation_rate[0] * (kernel_size[0] - 1) - (strides[0] - 1))

AveragePooling2D

    ring_buffer_size_in_time_dim = pool_size[0]

Flatten, GlobalMaxPooling2D, GlobalAveragePooling2D

    ring_buffer_size_in_time_dim = self.state_shape[1]

    State shape based on input_shape during training

To prepare Xmos for streaming networks, we could prototype a ring_buffer operator. We could create a xformer pass to insert the ring_buffer operator before a Conv2D layer. We could test it with a modified unit test we already use for Conv2D layers.

I already did some of this work before Tensorflow made their streaming plans public.

xmos / ai_tools

Time Delay Neural Network support #322