Open xd009642 opened 4 years ago
No support right now, no drive to do them from our side, we could discuss how to add them as an optional extension to the curent TF operator set (in a separate crate, I think).
Support for LSTM is relatively good once tract-tensorflow manages to translate them into the preferred tract-core form (based on a Scan operator with a subnetwork). On TF side this has been done for the BlockLSTM cell only, on Onnx side, the three (LSTM, RNN, GRU) have been translated, with Bidi support coming soon. I haven't look at BLSTM cases in TF, but it should be possible to translate them too.
Makes sense, for the MEL stuff I think the only complicated part is the STFT which relies on a compliant real value fourier transform operation, if that was present it might be enough for it to just work. Just because the rest of the operations mathematically are just frequency binning and a dot product.
If I have time I'll try and have a look on if I can prototype some rough proof-of-concept
Mmmm... you do need an op for the STFT, indeed. But how do you foresee doing the binning ? Either... 0/ I'm missing something obvious 1/ you need a "general purpose" binning operator that I don't know about and that we don't have yet, and that maybe we may want to add to tract-core 2/ you need a dedicated MEL-oriented binning operator 3/ you're going to generate horribly complex graphs that are likely to underperform by spending lot of time taking a crazy way to do something that should be trivial
I'm relatively aware of MEL and MFCC, but never done them in the NN.
I was thinking with 1 or 3 but I was going to start with seeing what tensorflow does and go from there. I did just look up the binning algorithm and it seems the bit that takes effort is:
for m in range(1, nfilt + 1):
f_m_minus = int(bin[m - 1]) # left
f_m = int(bin[m]) # center
f_m_plus = int(bin[m + 1]) # right
for k in range(f_m_minus, f_m):
fbank[m - 1, k] = (k - bin[m - 1]) / (bin[m] - bin[m - 1])
for k in range(f_m, f_m_plus):
fbank[m - 1, k] = (bin[m + 1] - k) / (bin[m + 1] - bin[m])
The rest can just be done with matrix operations so should be "fast enough"
All right, let's see how the POC is going, I guess :)
So I was looking at potentially moving from the tensorflow bindings to tract for some audio based neural networks I have. The preprocessing in the graph gets the MEL features from the audio so needs the tensorflow short-time-fourier-transform block.
Just wondering if you had any support for these and if not whether you'd consider adding them (or accepting a PR :smile:)?
Also, how is the support for common recurrent networks like LSTM and BLSTM cells?