Closed choiHkk closed 9 months ago
Hi, thanks for the PR. Really good documented. One more thing is I just fixed a typo in mono_layer_flow, see that and add it to this commit. Thanks.
@p0p4k The onnx conversion has just been successfully completed, and the inference has been carried out perfectly. The training stop at step 91k, and I will share the onnx file with Google Drive link below.
https://drive.google.com/drive/folders/1cWMiXSVGarHcVLaOzl568ndj4FuHEPUp?usp=sharing
@p0p4k I have discovered something amazing. Assuming the use of the ResidualCouplingTransformersLayer2 module, it seems that voice conversion is possible because latent variables do not deviate significantly from the distribution designed in the original VITS. I have added the samples inside the resources directory.
@choiHkk about voice conversion, it is quite possible that the "g" sent in text_encoder ends up not being used at all. I have seen in some of my experiments that the "g" sent in mel_encoder was being ignored as well. You can try to test these things, if you need help let me know. Add me on discord - p0p4k.
@p0p4k The onnx conversion has just been successfully completed, and the inference has been carried out perfectly. The training stop at step 91k, and I will share the onnx file with Google Drive link below.
https://drive.google.com/drive/folders/1cWMiXSVGarHcVLaOzl568ndj4FuHEPUp?usp=sharing
Hello, thank you for the wonderful work. An error popped out when I ran your onnx file, could you please help me check it? So I downloaded onnx and json files from the Google link, and ran:
python infer_onnx.py --model="./pretrained_91k.onnx" --config-path="./vits2_vctk_standard.json" --output-wav-path="./trained_models/output.wav" --text="hello world, how are you?"
Then this error shown:
`2023-10-26 23:47:59.197006110 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Gather node. Name:'/emb_g/Gather' Status Message: /onnxruntime_src/include/onnxruntime/core/framework/op_kernel_context.h:42 const T* onnxruntime::OpKernelContext::Input(int) const [with T = onnxruntime::Tensor] Missing Input: sid
Traceback (most recent call last):
File "infer_onnx.py", line 59, in
I think I need an extra argument, speaker id
@chengwuxinlin I think you should use the argument 'sid' when performing inference using ONNX, as follows:
parser.add_argument("--sid", required=False, type=int, help="Speaker ID to synthesize")
I think I need an extra argument, speaker id
sorry, my bad. I accidentally deleted the speaker id in the input. It's all good now, thank you for the fast response.
I have added the code related to the duration discriminator, residual coupling layer, and training pipeline mentioned in the previous issue. #59
Here are the changes:
vctk_test.wav
vits2_vctk_standard.json
data_utils.py
inference.ipynb
mel_processing.py
models.py
train_ms.py
The above sentences were generated by ChatGPT.