Closed kobenaxie closed 8 months ago
You can check this PR to see the steps needed to add e.g. volume perturbation https://github.com/lhotse-speech/lhotse/pull/382
However, in your case as you're unlikely to modify either sampling rate or num_samples/duration (so the metadata is unaffected), it might be a good idea to implement those as a signal transform instead -- see the ones implemented here https://github.com/lhotse-speech/lhotse/blob/master/lhotse/dataset/signal_transforms.py
They are supported e.g. in K2SpeechRecognitionDataset
via input_transforms
https://github.com/lhotse-speech/lhotse/blob/master/lhotse/dataset/speech_recognition.py#L65
Thank you for your reply and suggestions, I also find input_transforms
in K2SpeechRecoginitionDataset()
which is what i need in my case.
I want to use
SoxEffectTransform
to perform augmentaions like low_pass or pitch shift in torchaudio, how to write a customcut_transforms
, which can be used like CutMix.