Hi and thank you for sharing the code!
I was studying the creation of the Sinc filterbank in the SincConv_fast class and I have a question about this section:
band=(high-low)[:,0]
f_times_t_low = torch.matmul(low, self.n_)
f_times_t_high = torch.matmul(high, self.n_)
band_pass_left=((torch.sin(f_times_t_high)-torch.sin(f_times_t_low))/(self.n_/2))*self.window_ # Equivalent of Eq.4 of the reference paper (SPEAKER RECOGNITION FROM RAW WAVEFORM WITH SINCNET). I just have expanded the sinc and simplified the terms. This way I avoid several useless computations.
band_pass_center = 2*band.view(-1,1)
band_pass_right= torch.flip(band_pass_left,dims=[1])
band_pass=torch.cat([band_pass_left,band_pass_center,band_pass_right],dim=1)
I understand that band_pass_left is the left half of the filterbank and that the right part is built by symmetry. However, I cannot understand why the middle of the filterbank is created using 2*band, which from my understanding should be the bandwidth of the individual filters.
Could you please clarify?
Thank you!
Hi and thank you for sharing the code! I was studying the creation of the Sinc filterbank in the SincConv_fast class and I have a question about this section:
I understand that band_pass_left is the left half of the filterbank and that the right part is built by symmetry. However, I cannot understand why the middle of the filterbank is created using 2*band, which from my understanding should be the bandwidth of the individual filters. Could you please clarify? Thank you!