lhotse-speech / lhotse

Tools for handling speech data in machine learning projects.
https://lhotse.readthedocs.io/en/latest/
Apache License 2.0
902 stars 204 forks source link

Create a custom audio transformation #1338

Closed njellinas closed 4 weeks ago

njellinas commented 1 month ago

How can I create a custom audio transformation that can be applied to a CutSet? E.g. I want to apply torchaudio.sox_effects.apply_effects_tensor(y, sr, [["norm", f"{gain:.2f}"]]) How can I apply this transformation to every cut in a CutSet?

pzelasko commented 1 month ago

You'd define a transform class for that and add the relevant methods to recording/cut. You can see this PR for an end-to-end example: https://github.com/lhotse-speech/lhotse/pull/382/files#diff-add451896faa625c1820580ab6ad64bef75e2886d551efc0f5705100ea62b28a

These transforms are intended mostly for ops that affect the metadata (eg perturb speed). It might be easier to edit your dataset class and apply it there directly on the audio tensor.