jymsuper / SpeakerRecognition_tutorial

Simple d-vector based Speaker Recognition (verification and identification) using Pytorch
MIT License
210 stars 46 forks source link

TruncatedInputfromMFB #13

Closed hbo-lambda closed 3 years ago

hbo-lambda commented 3 years ago

sorry,my english is bad

class TruncatedInputfromMFB(object):
    """
    input size : (n_frames, dim=40)
    output size : (1, n_win=40, dim=40) => one context window is chosen randomly
    """
    def __init__(self, input_per_file=1):
        super(TruncatedInputfromMFB, self).__init__()
        self.input_per_file = input_per_file

    def __call__(self, frames_features):
        network_inputs = []
        num_frames = len(frames_features)

        win_size = c.NUM_WIN_SIZE
        half_win_size = int(win_size/2)
        #if num_frames - half_win_size < half_win_size:
        while num_frames - half_win_size <= half_win_size:
            frames_features = np.append(frames_features, frames_features[:num_frames,:], axis=0)
            num_frames =  len(frames_features)

        for i in range(self.input_per_file):
            j = random.randrange(half_win_size, num_frames - half_win_size)
            if not j:
                frames_slice = np.zeros(num_frames, c.FILTER_BANK, 'float64')
                frames_slice[0:(frames_features.shape)[0]] = frames_features.shape
            else:
                frames_slice = frames_features[j - half_win_size:j + half_win_size]
            network_inputs.append(frames_slice)
        return np.array(network_inputs)

frames_slice = np.zeros(num_frames, c.FILTER_BANK, 'float64')Is the code wrong? is frames_slice = np.zeros((num_frames, c.FILTER_BANK), 'float64')

jymsuper commented 3 years ago

Yes, you are right. It needs to be modified. Thank you.