Closed mthrok closed 3 years ago
@sw005320 For the reference, could you give me the pointer to ESPNet1's implementation of pitch?
We simply call Kaldi pitch extraction. We don't have our own pitch extraction.
I see, thanks!
Some thoughts on spec:
Interface
def compute_pitch_feats(
waveform: Tensor,
delta_pitch: float = 0.005,
frame_length: float = 25.,
frame_shift: float = 10.,
frames_per_chunk: int = 0,
lowpass_cutoff: float = 1000.,
lowpass_filter_width: int = 1,
max_f0: float = 400.,
max_frames_latency: int = 0,
min_f0: float = 50.,
nccf_ballast: float = 7000.,
nccf_ballast_online: bool = False,
penalty_factor: float = 0.1,
recompute_frame: int = 500,
resample_frequency: float = 4000,
sample_frequency: float = 16000,
simulate_first_pass_online: bool = False,
snip_edges: bool = True,
soft_min_f0: float = 10.,
upsample_filter_width: int = 5,
) -> Tensor:
...
Implementation
Test
@sw005320 I am looking at Kaldi implementation and wondering if we can limit the number of parameters to expose. For example, I do not think we need parameters for online feature extractions. Do you have a set of parameters you think will be changing?
https://kaldi-asr.org/doc/pitch-functions_8h_source.html#l00042
Sorry for my late response... We usually only change the sampling frequency (yes, it is necessary), and keep the other parameters default, but it's robustly working on various ASR tasks.
Also, I did not try the online pitch feature and I could not mention this part...
Kaldi pitch feature was added in #1243, and will be released as a beta feature in upcoming 0.8.0. We welcome feedback on the feature.
🚀 Feature
Add feature that is equivalent to Kaldi's
compute-kaldi-pitch-feats
.Motivation
From https://github.com/pytorch/audio/issues/679#issuecomment-638446056