Amphion (/Γ¦mΛfaΙͺΙn/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
This PR adds data padding functionality to the forward, inference, and get_prosody_feature methods in our model class. The _pad_data function pads the input audio tensor, making sure that its last dimension is a multiple of the hop length. This is common in audio processing where all frames need to have equal lengths for certain computations or analyses.
To test this PR, run any processes that use the forward, inference, and get_prosody_feature methods and observe if there are any issues or improvements with how the processed audio data aligns with the hop length.
π§ Related Issues
Issue #188 [BUG]: the lengths of the features after FACodecEncoderV2 is not match
π¨βπ» Changes Proposed
[x] Added _pad_data method which pads an input tensor along its last dimension.
[x] Modified forward, inference, and get_prosody_feature methods to use _pad_data.
π§βπ€βπ§ Who Can Review?
π TODO
β Checklist
[ ] Code has been reviewed
[ ] Code complies with the project's code standards and best practices
[ ] Code has passed all tests
[ ] Code does not affect the normal use of existing features
[ ] Code has been commented properly
[ ] Documentation has been updated (if applicable)
[ ] Demo/checkpoint has been attached (if applicable)
Fix Issue #188
β¨ Description
This PR adds data padding functionality to the
forward
,inference
, andget_prosody_feature
methods in our model class. The_pad_data
function pads the input audio tensor, making sure that its last dimension is a multiple of the hop length. This is common in audio processing where all frames need to have equal lengths for certain computations or analyses.To test this PR, run any processes that use the
forward
,inference
, andget_prosody_feature
methods and observe if there are any issues or improvements with how the processed audio data aligns with the hop length.π§ Related Issues
π¨βπ» Changes Proposed
_pad_data
method which pads an input tensor along its last dimension.forward
,inference
, andget_prosody_feature
methods to use_pad_data
.π§βπ€βπ§ Who Can Review?
π TODO
β Checklist