Open mthrok opened 4 years ago
I would like to work on this.
I would like to work on this.
Hi @engineerchuan
Thanks. Do you know what are good parameters for mfcc
? I am not expert but we can consult with our collaborators.
Not off top of my head. Let me study it first for a day and come up with a proposal.
Hi @mthrok,
I would like to follow this approach with some questions:
compute-fbank-feats
and compute-mfcc-feats
, first extract the default argument values.Question 1: How should we store and keep the default values for fbank and mfcc up to date?
Recommendation - cache the default fbank values and the override values in json. In future, revise manually if kaldi default argument values or example datasets default argument values change.
Question 2: How should we handle when some datasets don't have fbank config or don't have mfcc config?
Recommendation - we should only use configs from datasets for testing fbank or mfcc if they have the respective config.
Example: Switchboard has both fbank and mfcc config, thus we will use both for testing.
Example: librispeech only stores mfcc config, thus we will not use librispeech for testing fbank
Question 3: What should we do with generate_fbank_data.py
?
Currently generate_fbank_data.py
generates random parameters, which may be invalid. We could have it make network wget
calls to the relevant repositories if possible to retrieve and parse the values. It could inspect Kaldi source code directory or execute the executable path with --help
to parse out default values. This sounds hacky and maybe we should skip it for now.
Question 1: How should we store and keep the default values for fbank and mfcc up to date?
Recommendation - cache the default fbank values and the override values in json. In future, revise manually if kaldi default argument values or example datasets default argument values change.
I am not quite sure what you mean by cache, but in terms of JSON data, I think providing empty arguments {}
, would result in default parameters in both Kaldi CLI and torchaudio
's implementation. That way if Kaldi changes default values, I think we can notice. Then we can add arguments with the current default values {"allow_downsample": false, "allow_upsample": false, ... }
. I think the later is what you mean by caching.
BTW: Currently kaldi used in test CI is updated manually and I do it from time to time by building a new Docker file and pushing it. Although we plan to update it automatically, we do not know when that will happen.
Also, note that there are some parameter discrepancies on parameters due to inconsistent design. Kaldi expects full range wave form where as typical torchaudio
functional expects normalized waveform, yet torchaudio.compliance.kaldi
module expects full range values, which confuse users. https://github.com/pytorch/audio/issues/371#issuecomment-625613872, https://github.com/pytorch/audio/issues/328 I think for this test case, we use load_wav
with normalize=False
, but you might hit something. We have an idea of making kaldi
module consistent with the rest of the code base but we have not planned work items yet.
Question 2: How should we handle when some datasets don't have fbank config or don't have mfcc config?
Recommendation - we should only use configs from datasets for testing fbank or mfcc if they have the respective config.
Example: Switchboard has both fbank and mfcc config, thus we will use both for testing.
Example: librispeech only stores mfcc config, thus we will not use librispeech for testing fbank
Yes, that makes sense.
Question 3: What should we do with
generate_fbank_data.py
?Currently
generate_fbank_data.py
generates random parameters, which may be invalid. We could have it make networkwget
calls to the relevant repositories if possible to retrieve and parse the values. It could inspect Kaldi source code directory or execute the executable path with--help
to parse out default values. This sounds hacky and maybe we should skip it for now.
generate_fbank_data.py
is obsolete and provides no value. so we can simply delete it. If our tests can incorporate the latest changes on Kaldi side automatically, it would be nice, but at this moment, the priority is to have a good coverage of valid use cases. That itself is a great improvement.
Also making tests depend on external resource (networking, files stored elsewhere) increase maintenance cost, so we would like to refrain from doing it. Parsing help message of executables is plausible because it's available but let's defer on that one. We can discuss the extra value of doing that once we have a good set of values to test.
Hi @engineerchuan
I'm working on refactoring legacy code in our project: we have 40 Mb of Kaldi MFCC binary which we would like to replace with torchaudio.compliance.kaldi.mfcc
I managed to get nearly identical results between the call of Kaldi binary file and torchaudio.compliance.kaldi.mfcc
.
I don't know what my Kaldi binary version is, but it and the Torch implementation have several different default params, see table below:
parameter | torch_value | kaldi_value |
---|---|---|
blackman_coeff | 0.42 | 0.42 |
cepstral_lifter | 22.0 | 22 |
channel | -1 | -1 |
dither | 0.0 | 1 |
energy_floor | 1.0 | 0 |
frame_length | 25.0 | 25 |
frame_shift | 10.0 | 10 |
high_freq | 0.0 | 0 |
htk_compat | False | False |
low_freq | 20.0 | 20 |
num_ceps | 13 | 13 |
min_duration | 0.0 | 0 |
num_mel_bins | 23 | 23 |
preemphasis_coefficient | 0.97 | 0.97 |
raw_energy | True | True |
remove_dc_offset | True | True |
round_to_power_of_two | True | True |
sample_frequency | 16000.0 | 16000 |
snip_edges | True | True |
subtract_mean | False | False |
use_energy | False | True |
vtln_high | -500.0 | -500 |
vtln_low | 100.0 | 100 |
vtln_warp | 1.0 | 1 |
window_type | povey | povey |
allow_downsample | False | |
allow_upsample | False | |
debug_mel | False | |
max_feature_vectors | -1 | |
output_format | kaldi | |
utt2spk | "" | |
vtln_map | "" |
As you can see, aside from several missing params, dither
, energy_floor
and use_energy
are set to the opposite.
(also, Kaldi has gigantic dithering by default, so I spent good portion of today trying to understand why the two sets of results don't match)
Similar to #679
We should also revise the parameters for mfcc test.
See also #681