mcw519 / PureSound

Make the sound you hear pure and clean by deep learning.
7 stars 0 forks source link

_align_waveform may be a bug? #6

Closed zuowanbushiwo closed 10 months ago

zuowanbushiwo commented 1 year ago

Hi Wu when config speed_perturbed = True, There will be a crash. I located that _align_waveform this function does not align the data, and there maybe some logic errors. After the modification as follows, there is no crash. Thanks!

    def _align_waveform(
        self, enh_wav: torch.Tensor, ref_wav: torch.Tensor
    ) -> Tuple[torch.Tensor, torch.Tensor]:
        """Assume last axis is the time."""
        enh_wav_l = enh_wav.shape[-1]
        ref_wav_l = ref_wav.shape[-1]
        if enh_wav_l != ref_wav_l:
            if ref_wav_l < enh_wav_l:
                # align from last
                pad_num = enh_wav_l - ref_wav_l
                ref_wav = F.pad(ref_wav, (pad_num, 0))
            else:
                # align from begin
                ref_wav = ref_wav[..., :enh_wav_l]   // change here
        return enh_wav, ref_wav
mcw519 commented 1 year ago

Hi,

Thanks you find out this issue. I will fix the codes when I have time to do that. Yes, speed perturbed by sox will change the utterance length. But also change the speech characteristic like pitch or something else. That's why I ignore to handle the length change here (not recommend adding speed perturbation in TSE training).

Thanks