hashJoe commented 8 months ago

New Operator

Describe the operator

Fast LSTM implementation backed by cuDNN

It seems that CuDNNLSTM is still not supported by ONNX, as I am getting the following error message while converting from Tensorflow to ONNX using tf2onnx:

File "tf2onnx/utils.py", line 303, in make_sure
    raise ValueError("make_sure failure: " + error_msg % args)
ValueError: make_sure failure: rnn mode other than gru are not supported yet

Is this operator going to be added any time soon in future releases?

A speed up to 10x could be achieved by using this operator.

fatcat-z commented 5 months ago

Currently, there is no plan to support this OP yet. By the error message you provided, it's not clear what's the real problem.

If it is possible for you to share more code or code snippet, we might see if there is an alternative for your requirement.

hashJoe commented 5 months ago

@fatcat-z I have created a minimal reproducible example by taking the following keras code and adding CuDNNLSTM in it.

The code is given below:

import io
import numpy as np
import random
import tensorflow as tf

from tensorflow.python.framework.convert_to_constants import convert_variables_to_constants_v2

"""
## Prepare the data
"""

path = tf.keras.utils.get_file(
    "nietzsche.txt",
    origin="https://s3.amazonaws.com/text-datasets/nietzsche.txt",
)
with io.open(path, encoding="utf-8") as f:
    text = f.read().lower()
text = text.replace("\n", " ")  # We remove newlines chars for nicer display
print("Corpus length:", len(text))

chars = sorted(list(set(text)))
print("Total chars:", len(chars))
char_indices = dict((c, i) for i, c in enumerate(chars))
indices_char = dict((i, c) for i, c in enumerate(chars))

# cut the text in semi-redundant sequences of maxlen characters
maxlen = 40
step = 3
sentences = []
next_chars = []
for i in range(0, len(text) - maxlen, step):
    sentences.append(text[i : i + maxlen])
    next_chars.append(text[i + maxlen])
print("Number of sequences:", len(sentences))

x = np.zeros((len(sentences), maxlen, len(chars)), dtype="bool")
y = np.zeros((len(sentences), len(chars)), dtype="bool")
for i, sentence in enumerate(sentences):
    for t, char in enumerate(sentence):
        x[i, t, char_indices[char]] = 1
    y[i, char_indices[next_chars[i]]] = 1

"""
## Build the model: a single LSTM layer
"""

model = tf.keras.Sequential(
    [
        tf.keras.Input(shape=(maxlen, len(chars))),
        # # in tf2 it automatically falls back to CuDNNLSTM if below conditions are met
        # tf.keras.layers.LSTM(
        #     128,
        #     activation='tanh',
        #     recurrent_activation='sigmoid',
        #     recurrent_dropout=0,
        #     unroll=False,
        #     use_bias=True
        # ),
        # tf1
        tf.compat.v1.keras.layers.CuDNNLSTM(128),
        tf.keras.layers.Dense(len(chars), activation="softmax"),
    ]
)
optimizer = tf.keras.optimizers.RMSprop(learning_rate=0.01)
model.compile(loss="categorical_crossentropy", optimizer=optimizer)

"""
## Prepare the text sampling function
"""

def sample(preds, temperature=1.0):
    # helper function to sample an index from a probability array
    preds = np.asarray(preds).astype("float64")
    preds = np.log(preds) / temperature
    exp_preds = np.exp(preds)
    preds = exp_preds / np.sum(exp_preds)
    probas = np.random.multinomial(1, preds, 1)
    return np.argmax(probas)

"""
## Train the model
"""

epochs = 1
batch_size = 128

for epoch in range(epochs):
    model.fit(x, y, batch_size=batch_size, epochs=1)
    print()
    print("Generating text after epoch: %d" % epoch)

    start_index = random.randint(0, len(text) - maxlen - 1)
    for diversity in [0.2, 0.5, 1.0, 1.2]:
        print("...Diversity:", diversity)

        generated = ""
        sentence = text[start_index : start_index + maxlen]
        print('...Generating with seed: "' + sentence + '"')

        for i in range(400):
            x_pred = np.zeros((1, maxlen, len(chars)))
            for t, char in enumerate(sentence):
                x_pred[0, t, char_indices[char]] = 1.0
            preds = model.predict(x_pred, verbose=0)[0]
            next_index = sample(preds, diversity)
            next_char = indices_char[next_index]
            sentence = sentence[1:] + next_char
            generated += next_char

        print("...Generated: ", generated)
        print("-")
# tf.saved_model.save(model, "output/cudnnlstm")
# model.save(os.path.join("/tmp", model.name))

# freeze model
frozen_model = tf.function(lambda x: model(x)).get_concrete_function(tf.TensorSpec(
    model.inputs[0].shape, model.inputs[0].dtype
))
frozen_model = convert_variables_to_constants_v2(frozen_model)
frozen_model.graph.as_graph_def()
tf.io.write_graph(
    graph_or_graph_def=frozen_model.graph,
    logdir="output/",
    name="cudnnlstm_frozen.pb",
    as_text=False
)

then after freezing and saving the model, I simply tried converting the model to onnx using the following command:

python -m tf2onnx.convert --graphdef output/cudnnlstm_frozen.pb --output cudnnlstm_frozen.onnx --inputs x:0 --outputs Identity:0

and I get the following output:

2024-03-20 11:12:46,197 - INFO - Using tensorflow=2.11.0, onnx=1.15.0, tf2onnx=1.16.1/15c810
2024-03-20 11:12:46,197 - INFO - Using opset <onnx, 15>
2024-03-20 11:12:46,205 - WARNING - Cannot infer shape for sequential/cu_dnnlstm/CudnnRNNV2: sequential/cu_dnnlstm/CudnnRNNV2:3,sequential/cu_dnnlstm/CudnnRNNV2:4
2024-03-20 11:12:46,206 - INFO - Computed 0 values for constant folding
2024-03-20 11:12:46,278 - ERROR - Tensorflow op [sequential/cu_dnnlstm/CudnnRNNV2: CudnnRNNV2] is not supported
2024-03-20 11:12:46,279 - ERROR - Unsupported ops: Counter({'CudnnRNNV2': 1})
2024-03-20 11:12:46,280 - INFO - Optimizing ONNX model
2024-03-20 11:12:46,318 - INFO - After optimization: Cast -3 (5->2), Concat -1 (2->1), Const -21 (30->9), Expand -1 (2->1), Identity -2 (2->0), Reshape -2 (2->0), Squeeze -1 (2->1), Unsqueeze -5 (6->1)
2024-03-20 11:12:46,321 - INFO - 
2024-03-20 11:12:46,321 - INFO - Successfully converted TensorFlow model output/cudnnlstm_frozen.pb to ONNX
2024-03-20 11:12:46,321 - INFO - Model inputs: ['x:0']
2024-03-20 11:12:46,321 - INFO - Model outputs: ['Identity:0']
2024-03-20 11:12:46,321 - INFO - ONNX model is saved at cudnnlstm_frozen.onnx

Freezing the model ensures the explicit setting of CuDNNLSTM operator instead of the native LSTM to convert it to ONNX. Otherwise, ONNX will just convert LSTM operator.

onnx / tensorflow-onnx

Add support for tf.keras.layers.CuDNNLSTM #2290

New Operator

Describe the operator