Closed Darius-H closed 7 months ago
Hi @Darius-H !
Amphion will only store valid frames of the feature, i.e. what you said like feature[:, :valid frames, :]. So you don't have to worry about the padding zeros, because we have removed them after extracting the features and saving the compressed features into files.
If you have any other questions, feel free to contact us!
Hi @Darius-H , if you have any further questions about whisper features, feel free to re-open this issue. We are glad to follow up!
In MultipleContentsSVC, whisper will pad or truncate the original audio (like n seconds, n<30) to 30s to get the feature with shape: (batch, 1500, 1024), should we just truncate the feature to feature=feature[:,:int(1500/30*n),:]