Setting batch size to number that isn't 128 throws error

roshkins commented 2 years ago

Hi LPCNet,

I've been debugging for the better part of a day why I'm getting this error after executing train_lpcnet.py with a smaller, 64 batch size.

The only answer is that the model's input layer is 128, and I can't figure out how to change it.

Has this been tested to work? I'm running into GPU memory errors with 16gb.

2 root error(s) found.
  (0) Invalid argument:  Invalid input_h shape: [1,128,384] [1,64,384]
     [[node model/gru_a/CudnnRNNV2 (defined at /anaconda3/envs/tf-gpu/lib/python3.9/threading.py:973) ]]
     [[div_no_nan/Identity_1/ReadVariableOp/_104]]
  (1) Invalid argument:  Invalid input_h shape: [1,128,384] [1,64,384]
     [[node model/gru_a/CudnnRNNV2 (defined at /anaconda3/envs/tf-gpu/lib/python3.9/threading.py:973) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_train_function_6366]

Function call stack:
train_function -> train_function
  File "/home/rashi/LPCNet/training_tf2/train_lpcnet.py", line 189, in <module> (Current frame)
    model.fit(loader, epochs=nb_epochs, validation_split=0.0, callbacks=[checkpoint, sparsify, grub_sparsify, csv_logger])

jmvalin commented 2 years ago

I think I accidentally hardcoded the 128 in the model. As a short-term work-around, try changing the batch_size=128 arg in the Input layers to batch_size=64. That should solve the problem.

jmvalin commented 2 years ago

Actually, see if the latest commit fixes the problem.

xiaochunxin commented 2 years ago

Actually, see if the latest commit fixes the problem.

Hi, can you share the exact version of TF.Various TF versions i have been tried, but still cant run the program properly,ths~

roshkins commented 2 years ago

It's working! The commit works. Thanks! My Tensorflow version is 2.4.1.

xiph / LPCNet

Setting batch size to number that isn't 128 throws error #166