lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
932 stars 213 forks source link

Misleading feature type #586

Open csukuangfj opened 2 years ago

csukuangfj commented 2 years ago

In https://github.com/lhotse-speech/lhotse/blob/d9c4141319adb39f64684c762aa541467d25f7fc/lhotse/kaldi.py#L144-L145

It uses kaldiio as the feature type.

However, https://github.com/lhotse-speech/lhotse/blob/d9c4141319adb39f64684c762aa541467d25f7fc/lhotse/features/base.py#L407-L408 says possible types are fbank, mfcc, etc.

pzelasko commented 2 years ago

Good catch! I think I hard-coded kaldiio in there because it might be tricky to determine whether the features imported from Kaldi are fbanks or mfccs (and we'd want to define computing energies and mixing differently for them). Any ideas?

EDIT: technically they can also be fbank+pitch, mfcc+pitch, etc... seems like a rabbit hole.

csukuangfj commented 2 years ago

How about making users provide it?

pzelasko commented 2 years ago

+1