Error during inference : Kernel size can't be greater than actual input size

FrenchKrab commented 1 year ago

On certain files, on sliding window mode, the inference will crash because the input is smaller than the kernel. For example given this inference:

inference = Inference(MODEL_NAME, step=5.0, duration=5.0)

The inference might crashed if it's applied to certain files. These "certain files" appear to be files where the last window to be computed will be too short for the model. For example with a inference==step, this UEM crashes: 3b79017c-4d42-40fc-a1bb-4a20bc8ebca7 1 0.000 300.002 (last window will be 0.002 seconds long) but this one does not: fe0eab73-f908-400a-a25b-fdcc9b86a029 1 0.000 300.000

Full error log

Cell In[38], line 39
     37    else:
     38        print(f"{idx} ] {file['database']}/{file['uri']} : computing inference...")
---> 39        segmentation = inference(file)
     40 #       with open(custom_file_name,'wb') as f:
     41 #           pickle.dump(segmentation, f)
     42    num_chunks, num_frames, num_speakers = segmentation.data.shape

File /path/to/python/pyannote-audio/pyannote/audio/core/inference.py:362, in Inference.__call__(self, file, hook)
    359 waveform, sample_rate = self.model.audio(file)
    361 if self.window == "sliding":
--> 362     return self.slide(waveform, sample_rate, hook=hook)
    364 return self.infer(waveform[None])[0]

File /path/to/python/pyannote-audio/pyannote/audio/core/inference.py:288, in Inference.slide(self, waveform, sample_rate, hook)
    285 # process orphan last chunk
    286 if has_last_chunk:
--> 288     last_output = self.infer(last_chunk[None])
    290     if specifications.resolution == Resolution.FRAME:
    291         pad = num_frames_per_chunk - last_output.shape[1]

File /path/to/python/pyannote-audio/pyannote/audio/core/inference.py:204, in Inference.infer(self, chunks)
    199             raise MemoryError(
    200                 f"batch_size ({self.batch_size: d}) is probably too large. "
    201                 f"Try with a smaller value until memory error disappears."
    202             )
    203         else:
--> 204             raise exception
    206 # convert powerset to multi-label unless specifically requested not to
    207 if self.model.specifications.powerset and not self.skip_conversion:

File /path/to/python/pyannote-audio/pyannote/audio/core/inference.py:196, in Inference.infer(self, chunks)
    194 with torch.no_grad():
    195     try:
--> 196         outputs = self.model(chunks.to(self.device))
    197     except RuntimeError as exception:
    198         if is_oom_error(exception):

File /path/to/python/mamba/envs/m_pyannote_dev1/lib/python3.9/site-packages/torch/nn/modules/module.py:1190, in Module._call_impl(self, *input, **kwargs)
   1186 # If we don't have any hooks, we want to skip the rest of the logic in
   1187 # this function, and just call forward.
   1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1189         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190     return forward_call(*input, **kwargs)
   1191 # Do not call functions when jit is used
   1192 full_backward_hooks, non_full_backward_hooks = [], []

File /path/to/python/pyannote-audio/pyannote/audio/models/segmentation/PyanNet.py:171, in PyanNet.forward(self, waveforms)
    159 def forward(self, waveforms: torch.Tensor) -> torch.Tensor:
    160     """Pass forward
    161 
    162     Parameters
   (...)
    168     scores : (batch, frame, classes)
    169     """
--> 171     outputs = self.sincnet(waveforms)
    173     if self.hparams.lstm["monolithic"]:
    174         outputs, _ = self.lstm(
    175             rearrange(outputs, "batch feature frame -> batch frame feature")
    176         )

File /path/to/python/mamba/envs/m_pyannote_dev1/lib/python3.9/site-packages/torch/nn/modules/module.py:1190, in Module._call_impl(self, *input, **kwargs)
   1186 # If we don't have any hooks, we want to skip the rest of the logic in
   1187 # this function, and just call forward.
   1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1189         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190     return forward_call(*input, **kwargs)
   1191 # Do not call functions when jit is used
   1192 full_backward_hooks, non_full_backward_hooks = [], []

File /path/to/python/pyannote-audio/pyannote/audio/models/blocks/sincnet.py:87, in SincNet.forward(self, waveforms)
     81 outputs = self.wav_norm1d(waveforms)
     83 for c, (conv1d, pool1d, norm1d) in enumerate(
     84     zip(self.conv1d, self.pool1d, self.norm1d)
     85 ):
---> 87     outputs = conv1d(outputs)
     89     # https://github.com/mravanelli/SincNet/issues/4
     90     if c == 0:

File /path/to/python/mamba/envs/m_pyannote_dev1/lib/python3.9/site-packages/torch/nn/modules/module.py:1190, in Module._call_impl(self, *input, **kwargs)
   1186 # If we don't have any hooks, we want to skip the rest of the logic in
   1187 # this function, and just call forward.
   1188 if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1189         or _global_forward_hooks or _global_forward_pre_hooks):
-> 1190     return forward_call(*input, **kwargs)
   1191 # Do not call functions when jit is used
   1192 full_backward_hooks, non_full_backward_hooks = [], []

File /path/to/python/mamba/envs/m_pyannote_dev1/lib/python3.9/site-packages/asteroid_filterbanks/enc_dec.py:177, in Encoder.forward(self, waveform)
    175 filters = self.get_filters()
    176 waveform = self.filterbank.pre_analysis(waveform)
--> 177 spec = multishape_conv1d(
    178     waveform,
    179     filters=filters,
    180     stride=self.stride,
    181     padding=self.padding,
    182     as_conv1d=self.as_conv1d,
    183 )
    184 return self.filterbank.post_analysis(spec)

File /path/to/python/mamba/envs/m_pyannote_dev1/lib/python3.9/site-packages/asteroid_filterbanks/scripting.py:37, in script_if_tracing.<locals>.wrapper(*args, **kwargs)
     33 @functools.wraps(fn)
     34 def wrapper(*args, **kwargs):
     35     if not is_tracing():
     36         # Not tracing, don't do anything
---> 37         return fn(*args, **kwargs)
     39     compiled_fn = torch.jit.script(wrapper.__original_fn)  # type: ignore
     40     return compiled_fn(*args, **kwargs)

File /path/to/python/mamba/envs/m_pyannote_dev1/lib/python3.9/site-packages/asteroid_filterbanks/enc_dec.py:216, in multishape_conv1d(waveform, filters, stride, padding, as_conv1d)
    212 batch, channels, time_len = waveform.shape
    213 if channels == 1 and as_conv1d:
    214     # That's the common single channel case (batch, 1, time)
    215     # Output will be (batch, freq, stft_time), behaves as Conv1D
--> 216     return F.conv1d(waveform, filters, stride=stride, padding=padding)
    217 else:
    218     # Return batched convolution, input is (batch, 3, time), output will be
    219     # (b, 3, f, conv_t). Useful for multichannel transforms. If as_conv1d is
    220     # false, (batch, 1, time) will output (batch, 1, freq, conv_time), useful for
    221     # consistency.
    222     return batch_packed_1d_conv(waveform, filters, stride=stride, padding=padding)

RuntimeError: Calculated padded input size per channel: (29). Kernel size: (251). Kernel size can't be greater than actual input size

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 7 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] commented 1 month ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

pyannote / pyannote-audio

Error during inference : Kernel size can't be greater than actual input size #1260

Full error log