kensho-technologies / pyctcdecode

A fast and lightweight python-based CTC beam search decoder for speech recognition.
Apache License 2.0
419 stars 89 forks source link

TypeError: max() received an invalid combination of arguments #12

Closed rbracco closed 3 years ago

rbracco commented 3 years ago

I am following the basic tutorial and when decoding probabilities with no language model I run into an error during the decode step.

Env info

pyctcdecode 0.1.0
numpy 1.21.0

Code

# Change from (1 x classes x length) to (length x classes)
probabilities = probabilities.transpose(1, 2).squeeze(0)
print(probabilities.shape) # torch.Size([276, 36]) - shape appears to be correct
decoder = build_ctcdecoder(labels)
text = decoder.decode(probabilities)

Stack Trace

Traceback (most recent call last):
  File "train/train_nemo.py", line 79, in <module>
    text = decoder.decode(probabilities)
  File "/home/rob/code/thunder_speech_latest/.venv/lib/python3.8/site-packages/pyctcdecode/decoder.py", line 611, in decode
    decoded_beams = self.decode_beams(
  File "/home/rob/code/thunder_speech_latest/.venv/lib/python3.8/site-packages/pyctcdecode/decoder.py", line 513, in decode_beams
    logits = np.clip(_log_softmax(logits, axis=1), np.log(MIN_TOKEN_CLIP_P), 0)
  File "/home/rob/code/thunder_speech_latest/.venv/lib/python3.8/site-packages/pyctcdecode/decoder.py", line 85, in _log_softmax
    x_max = np.amax(x, axis=axis, keepdims=True)
  File "<__array_function__ internals>", line 5, in amax
  File "/home/rob/code/thunder_speech_latest/.venv/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 2754, in amax
    return _wrapreduction(a, np.maximum, 'max', axis, None, out,
  File "/home/rob/code/thunder_speech_latest/.venv/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 84, in _wrapreduction
    return reduction(axis=axis, out=out, **passkwargs)
TypeError: max() received an invalid combination of arguments - got (keepdims=bool, out=NoneType, axis=int, ), but expected one of:
 * ()
 * (Tensor other)
 * (int dim, bool keepdim)
 * (name dim, bool keepdim)
gkucsko commented 3 years ago

Thanks for the snippet, does the problem persist if you convert your torch tensor to a numpy array before decoding? eg probabilities.numpy()

rbracco commented 3 years ago

Wow, thank you for the lightning fast reply.

Changing decoder.decode(probabilities) to decoder.decode(probabilities.detach().numpy()) fixed the issue! Cheers.

gkucsko commented 3 years ago

great! :)