jatinchowdhury18 / RTNeural

Real-time neural network inferencing
BSD 3-Clause "New" or "Revised" License
577 stars 57 forks source link

Strides argument in RTNeural's Conv1D layers #144

Open ABSounds opened 1 month ago

ABSounds commented 1 month ago

Hi and thanks for the amazing library!

I'm trying to implement a model in RTNeural from a TensorFlow model containing a Conv1D layer defined as follows:

Conv1D(filters=32, kernel_size=12, strides=12, padding='same')

I'm struggling to find a way to pass the strides argument to the RTNeural Conv1D layer. Is there a way to do this?

As it is right now, the .json file doesn't specify it, and the inferencing is very slow. I suspect this is due to the missing strides parameter.

Thanks!

jatinchowdhury18 commented 1 month ago

Hello!

At the moment RTNeural does not support strided 1D convolutions. The Conv1D layer in RTNeural was designed to work with dilated convolutions which are a little different, and (at least in TensorFlow) are not always compatible with strided convolutions.

This is something we should add support for... hopefully I'll have some time this weekend to have a look at it.

ABSounds commented 1 month ago

Hi Jatin!

Yeah, it would be great to have the added support for strides. Thanks for the effort and for maintaining the library.

jatinchowdhury18 commented 1 month ago

Hello! I had a look at implementing strided convolutions today... it seems a little bit more complicated than I had originally anticipated. Would it be possible to describe the specifics of your model on a bit more detail, or even provide an example of your full model in Python?

Part of the reason I'm asking is that RTNeural does not support 1D convolution layers with "same" or "valid" padding. At the moment, RTNeural only supports "causal" 1D convolution, since this typically best fits the types of models that are used in real-time audio processing. I'm worried that there could be a disconnect between how RTNeural is expecting a 1D convolution layer to be used, and how it's being used in your use-case, so I just want to make sure we're on the same page before I go further.

Thanks!

ABSounds commented 1 month ago

Hi Jatin!

Sure, I'll paste the code I was using in Tensorflow for the model. It was inspired by GuitarML's GuitarLSTM repo. It's supposed to take a buffer of a previous number of input samples to generate a single output sample.

model = Sequential()
model.add(Input((input_buffer_size, 1)))
model.add(Conv1D(conv1d_filters, 12, strides=conv1d_strides, activation=None, padding='same'))
model.add(GRU(hidden_units))
model.add(Dense(1, activation=None))

It does indeed use "same" padding. Is it not possible to implement it in RTNeural then? Thanks again! I'm just starting with NNs, so please let me know if there's a better way to achieve this with RTNeural or if I'm missing something.

jatinchowdhury18 commented 1 month ago

Hi!

Sorry for the slow replies... got a lot on my plate at the moment.

The data flow you've described should definitely possible in RTNeural, we'd just need to implement the strided convolutions.

Just to make sure we're on the same page, this is the little test script I've been working on. Is this roughly the same as how you would pass data through the model in training?

# construct TensorFlow model
N = 100
input_buffer_size = 4
conv1d_filters = 3
conv1d_strides = 1
hidden_units = 8

model = keras.Sequential()
model.add(keras.Input((input_buffer_size, 1)))
model.add(keras.layers.Conv1D(conv1d_filters, 12, strides=conv1d_strides, activation=None, padding='same'))
model.add(keras.layers.GRU(hidden_units))
model.add(keras.layers.Dense(1, activation=None))

# construct signals
x = 10 * np.sin(np.arange(N) * np.pi * 0.1)
x_buffer = np.array([np.zeros(input_buffer_size)])
print(x_buffer.shape)
x_buffer[-1] = x[0]
for i in range(1,N):
    x_step = np.array([np.zeros(input_buffer_size)])
    print(i)
    print(x[max(i - input_buffer_size + 1,0):i+1].shape)
    print(x_step[0,max(input_buffer_size - i - 1,0):].shape)
    x_step[0,max(input_buffer_size - i - 1,0):] = x[max(i - input_buffer_size + 1,0):i+1]
    x_buffer = np.concatenate((x_buffer, x_step), axis=0)
print(x_buffer.shape)
# print(x_buffer)

y = model.predict((x_buffer.reshape((-1, input_buffer_size, 1))))
print(y.shape)
y = y.flatten()

# plot signals
plt.figure()
plt.plot(x)
plt.plot(y, '--')

Thanks, Jatin

ABSounds commented 1 month ago

Hey Jatin,

Thanks for your response!

Is this roughly the same as how you would pass data through the model in training?

I was a bit confused about how you're constructing the input signal, but yes, that's totally equivalent as how I'm generating mine and feeding it to the model during training.

Sorry for the delay, I've been out for a few days.