Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
I modified the VALL-E model and trained on AR, then NAR; after when I encountered some issues with the AR part, where the length of the result cannot be ensured, so I want to infer the AR ckpt since there is no need for further training on NAR.
Expected Outcome
Run the infer step with only AR ckpt, and output the AR layer result (shape or values)
Problem Overview
I modified the VALL-E model and trained on AR, then NAR; after when I encountered some issues with the AR part, where the length of the result cannot be ensured, so I want to infer the AR ckpt since there is no need for further training on NAR.
Expected Outcome
Run the infer step with only AR ckpt, and output the AR layer result (shape or values)