Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.
I have extracted prosody_feature and encoder_output from FACodecEncoderV2. It raise wrong when I use fa_decoder_v2 to extract vq codecs becaucse the lengths of prosody_feature(torch.Size([1, 20, 281])) and encoder_output(torch.Size([1, 256, 282])) is not same.
File "/home/data/mahaotian/Amphion/models/codec/ns3_codec/inference_codc.py", line 129, in
vq_post_emb_a, vq_ida, , quantized, spk_embs_a = fa_decoder_v2(
File "/home/data/mahaotian/anaconda3/envs/vallex/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/home/data/mahaotian/anaconda3/envs/vallex/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(args, **kwargs)
File "/home/data/mahaotian/Amphion/models/codec/ns3_codec/facodec.py", line 1086, in forward
outs, qs, commit_loss, quantized_buf = self.quantize(
File "/home/data/mahaotian/Amphion/models/codec/ns3_codec/facodec.py", line 1048, in quantize
outs += out
RuntimeError: The size of tensor a (281) must match the size of tensor b (282) at non-singleton dimension 2
bug of FACodecEncoderV2
I have extracted prosody_feature and encoder_output from FACodecEncoderV2. It raise wrong when I use fa_decoder_v2 to extract vq codecs becaucse the lengths of prosody_feature(torch.Size([1, 20, 281])) and encoder_output(torch.Size([1, 256, 282])) is not same.
my code
wav_b = librosa.load(wav_b, sr=16000)[0] wav_b = torch.from_numpy(wav_b).float() wav_b = wav_b.unsqueeze(0).unsqueeze(0) enc_out_b = fa_encoder_v2(wav_b) prosody_b = fa_encoder_v2.get_prosody_feature(wav_b) vq_post_emb_b, vq_idb, , quantized, spk_embs_b = fa_decoder_v2( enc_out_b, prosody_b, eval_vq=False, vq=True )
bug
File "/home/data/mahaotian/Amphion/models/codec/ns3_codec/inference_codc.py", line 129, in
vq_post_emb_a, vq_ida, , quantized, spk_embs_a = fa_decoder_v2(
File "/home/data/mahaotian/anaconda3/envs/vallex/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, *kwargs)
File "/home/data/mahaotian/anaconda3/envs/vallex/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(args, **kwargs)
File "/home/data/mahaotian/Amphion/models/codec/ns3_codec/facodec.py", line 1086, in forward
outs, qs, commit_loss, quantized_buf = self.quantize(
File "/home/data/mahaotian/Amphion/models/codec/ns3_codec/facodec.py", line 1048, in quantize
outs += out
RuntimeError: The size of tensor a (281) must match the size of tensor b (282) at non-singleton dimension 2