Colab: Error when training new model

siflueckiger commented 2 years ago

Hello. When I am trying to train a new model on google colab. I run into the following error:

Training new model w/ 3-layer, 128-cell LSTMs
Training on 125,286 character sequences.
Epoch 1/20

---------------------------------------------------------------------------

UnknownError                              Traceback (most recent call last)

<ipython-input-8-766a9f967633> in <module>()
     18     max_length=model_cfg['max_length'],
     19     dim_embeddings=100,
---> 20     word_level=model_cfg['word_level'])

4 frames

/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
     57     ctx.ensure_initialized()
     58     tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
---> 59                                         inputs, attrs, num_outputs)
     60   except core._NotOkStatusException as e:
     61     if name is not None:

UnknownError:    Fail to find the dnn implementation.
     [[{{node CudnnRNN}}]]
     [[model_2/rnn_1/PartitionedCall]] [Op:__inference_train_function_10520]

Function call stack:
train_function -> train_function -> train_function

i already tried the command !kill -9 -1 from another issue. It didn't worked. Can anybody help me? Thanks..

Fqlox commented 2 years ago

Hi, did you try to not run the block %tensorflow_version 1.x since the project is now using tensorflow 2.1 ?

mocallito commented 2 years ago

Hi, did you try to not run the block %tensorflow_version 1.x since the project is now using tensorflow 2.1 ?

Dude, it worked thx

ghost commented 2 years ago

Note: for me the below alone didn't work, I also had to do a factory reset runtime.

Hi, did you try to not run the block %tensorflow_version 1.x since the project is now using tensorflow 2.1 ?

minimaxir / textgenrnn

Colab: Error when training new model #247