xinjli / allosaurus

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
GNU General Public License v3.0
531 stars 85 forks source link

Cannot open 32 bit floating audio file #33

Open freddy5566 opened 2 years ago

freddy5566 commented 2 years ago

Hi,

It seems like the wave package does not support 32-bit floating encoding. Here is the error message:

Traceback (most recent call last):
  File "/home/jamfly/miniconda2/envs/sb/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/jamfly/miniconda2/envs/sb/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/jamfly/miniconda2/envs/sb/lib/python3.8/site-packages/allosaurus/run.py", line 71, in <module>
    phones = recognizer.recognize(args.input, args.lang, args.topk, args.emit, args.timestamp)
  File "/home/jamfly/miniconda2/envs/sb/lib/python3.8/site-packages/allosaurus/app.py", line 63, in recognize
    audio = read_audio(filename)
  File "/home/jamfly/miniconda2/envs/sb/lib/python3.8/site-packages/allosaurus/audio.py", line 17, in read_audio
    wf = wave.open(filename)
  File "/home/jamfly/miniconda2/envs/sb/lib/python3.8/wave.py", line 510, in open
    return Wave_read(f)
  File "/home/jamfly/miniconda2/envs/sb/lib/python3.8/wave.py", line 164, in __init__
    self.initfp(f)
  File "/home/jamfly/miniconda2/envs/sb/lib/python3.8/wave.py", line 144, in initfp
    self._read_fmt_chunk(chunk)
  File "/home/jamfly/miniconda2/envs/sb/lib/python3.8/wave.py", line 269, in _read_fmt_chunk
    raise Error('unknown format: %r' % (wFormatTag,))
wave.Error: unknown format: 3

Could we try to use torchaudio instead of the wave to open files?

Thank you

xinjli commented 2 years ago

hi, thanks for your suggestion. this makes a lot of sense. We will update to fix it

freddy5566 commented 2 years ago

Sorry for the late reply, thank you for your work. Just out of curiosity, when will it be updated?

xinjli commented 2 years ago

hmm, it looks like that torchaudio has a bug when loading 16 bit / 32 bit.

The current model depends on numpy int16 for feature extraction, but torchaudio's loading is float by default and somehow it fails to load int16 even I specified its normalization config (it loads as int32 which is overflowing int16, so it cannot be casted. forcing cast corrupt the results).

I am currently trying to upgrade to a new version removing most numpy dependency and using all torch feature including the torchaudio, so I guess I can only fix it when releasing the new model.

freddy5566 commented 2 years ago

okay, thanks for your help.