louiskirsch / speechT

An opensource speech-to-text software written in tensorflow
Apache License 2.0
157 stars 36 forks source link

Error while using record with --mfcc #21

Open MakarTroyan opened 6 years ago

MakarTroyan commented 6 years ago

Hi, @timediv. I changed in the recording.py

print('Recording audio')
raw_audio, sample_width = recorder.record()
raw_audio = np.array(raw_audio)

to

import soundfile as sf
raw_audio, sample_rate = sf.read(path_wav_file)
raw_audio = np.array(raw_audio)

and try to run sudo python3 speecht-cli record --mfcc --train-dir train --run-name best_run --language-model kenlm-english/, but have:

Generate MFCCs or power spectrogram
Running speech recognition
Traceback (most recent call last):
  File "speecht-cli", line 221, in <module>
    cli.run()
  File "speecht-cli", line 207, in run
    self.command_executor.run()
  File "/home/ubuntu/s2t/speecht/recording.py", line 71, in run
    [decoded] = model.step(sess, loss=False, update=False, decode=True)
  File "/home/ubuntu/s2t/speecht/speech_model.py", line 231, in step
    input_feed_dict = self.input_loader.get_feed_dict() or {}
  File "/home/ubuntu/s2t/speecht/speech_input.py", line 108, in get_feed_dict
    input_tensor, sequence_lengths, max_time = self._get_inputs_feed_item([self.speech_input])
  File "/home/ubuntu/s2t/speecht/speech_input.py", line 43, in _get_inputs_feed_item
    input_tensor[idx, :inp.shape[0], :] = inp
ValueError: could not broadcast input array from shape (493,39) into shape (493,128)

Why does it happen and how can I fix it? P.S. with --power work OK

louiskirsch commented 6 years ago

Have you trained your model using mfccs?

MakarTroyan commented 6 years ago

Oh! I hadn`t thought about it, but I have used your pretrained one from here