Open ATriantafyllopoulos opened 8 months ago
Hi
Thanks for the proposal, however, this project no longer has an active maintainer.
Hi
Thanks for the proposal, however, this project no longer has an active maintainer.
Wow, I missed that. Was this discussed somewhere (discord/forum)? And is there a way to kickstart this project again or was it migrated anywhere else (e.g. to pytorch core?)
It was not planned or discussed, but it just happened. Sorry.
Very sad that torchaudio is no longer actively maintained. Many pieces are useful, well-designed, and most importantly, highly performant (e.g. STFT). The audio land doesn't have many good libraries like its vision counterpart :(.
@ATriantafyllopoulos I also keep rolling my own random audio/spectrogram crop across my audio projects. Would be nice if torchaudio has it.
🚀 The feature
I am proposing to add a
torch.nn.Module
transform that automatically crops/pads signals (with different options for padding such as constant/mirroring). I have the implementation already local so I would push it myself if this is alright.The interface would like as follows:
I am looking for feedback to see if this is also needed/desired by others and whether I should open a PR to add it.
Motivation, pitch
This feature is needed for datasets with variable lengths (a common occurrence for audio). By default, this mismatch in lengths now needs to be handled in the collate function of the dataloader.
With the proposed transform, the user can add it directly to their transform pipeline and/or make it part of their model if they so wish. Moreover, they could simply utilize it in their
collate_fn
if they want to crop based on the particular batch statistics (e.g. crop/pad to the shortest/longest sample in the batch).Alternatives
No response
Additional context
A reference implementation and interface can be seen here. As it is implemented with
numpy
, I would update totorch
.