mravanelli / SincNet

SincNet is a neural architecture for efficiently processing raw audio samples.
MIT License
1.14k stars 263 forks source link

Question about the creation of the Sinc filterbank #107

Open fColangelo opened 3 years ago

fColangelo commented 3 years ago

Hi and thank you for sharing the code! I was studying the creation of the Sinc filterbank in the SincConv_fast class and I have a question about this section:

        band=(high-low)[:,0]

        f_times_t_low = torch.matmul(low, self.n_)
        f_times_t_high = torch.matmul(high, self.n_)

        band_pass_left=((torch.sin(f_times_t_high)-torch.sin(f_times_t_low))/(self.n_/2))*self.window_ # Equivalent of Eq.4 of the reference paper (SPEAKER RECOGNITION FROM RAW WAVEFORM WITH SINCNET). I just have expanded the sinc and simplified the terms. This way I avoid several useless computations. 
        band_pass_center = 2*band.view(-1,1)
        band_pass_right= torch.flip(band_pass_left,dims=[1])

        band_pass=torch.cat([band_pass_left,band_pass_center,band_pass_right],dim=1)

I understand that band_pass_left is the left half of the filterbank and that the right part is built by symmetry. However, I cannot understand why the middle of the filterbank is created using 2*band, which from my understanding should be the bandwidth of the individual filters. Could you please clarify? Thank you!