lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
936 stars 214 forks source link

How to write a custom CutTransform ? #1262

Closed kobenaxie closed 8 months ago

kobenaxie commented 8 months ago

I want to use SoxEffectTransform to perform augmentaions like low_pass or pitch shift in torchaudio, how to write a custom cut_transforms, which can be used like CutMix.

transforms = []
transforms.append(
    CutCustom(...)
)
transforms.append(
    CutMix(...)
)
dataset = K2SpeechRecognitionDataset(
    cut_transforms=ransforms,
)
pzelasko commented 8 months ago

You can check this PR to see the steps needed to add e.g. volume perturbation https://github.com/lhotse-speech/lhotse/pull/382

However, in your case as you're unlikely to modify either sampling rate or num_samples/duration (so the metadata is unaffected), it might be a good idea to implement those as a signal transform instead -- see the ones implemented here https://github.com/lhotse-speech/lhotse/blob/master/lhotse/dataset/signal_transforms.py

They are supported e.g. in K2SpeechRecognitionDataset via input_transforms https://github.com/lhotse-speech/lhotse/blob/master/lhotse/dataset/speech_recognition.py#L65

kobenaxie commented 8 months ago

Thank you for your reply and suggestions, I also find input_transforms in K2SpeechRecoginitionDataset() which is what i need in my case.