ming024 / FastSpeech2

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"
MIT License
1.77k stars 527 forks source link

Need help converting FastSpeech model to ONNX to run on Tensor RT #107

Open EuphoriaCelestial opened 2 years ago

EuphoriaCelestial commented 2 years ago

Hi, I have my Fastspeech model trained and working well, and I want to improve the speed by running the model on Tensor RT (maybe convert preprocess code to C++ later). Currently I am following this example to export ONNX model file: https://docs.microsoft.com/en-us/windows/ai/windows-ml/tutorials/pytorch-convert-model But I dont know how to create the dummy input Can someone help me with this, ty

FasoCA commented 2 years ago

Dummy inputs are tensors of the size as expected by the model but filled with either random values or zeros.

Did you manage to complete the onnx conversion? It seems that the torch.bucketize operator is not currently supported (pytorch 1.8, onnx opset = 13)

EuphoriaCelestial commented 2 years ago

@FasoCA Yes I also have that error with bucketize, I dont remember how I fixed it but it was temporary, and I am not sure if it was right method. I have finished conversion for both Fastspeech model and vocoder model, but there is some warning because there is if-else clause inside forward class of Fastspeech model. It will not be able to trace if-else clause. The vocoder conversion is done with no error. Anyway, the whole pipeline still able to run with 2 converted models but encounter error in some special case. By far, I am not using converted Fastspeech model, just the vocoder. So my pipeline will include Fastspeech Pytorch model and HifiGAN TensorRT model. I am still using Python, consider convert to C++ later

FasoCA commented 2 years ago

@EuphoriaCelestial Much appreciated the reply. I've also been working exclusively in python so far.

To get around the lack of torch.bucketize support, one can write a custom onnx operator in C++ (maybe following what's described here: https://github.com/onnx/tutorials/blob/master/PyTorchCustomOperator/README.md but I have never done it), re-write in python functionally equivalent operations and swap them for bucketize, or somehow skip that section of the code entirely (if possible). Not sure which route would be best. Do you recall your solution?

Thanks for the heads-up about the if-clause. I think it's a branch on training vs. inference, correct? In which case, one could generate separate models for the two cases. Is this what you are referring to, when you talk about "2 converted models"?

EuphoriaCelestial commented 2 years ago

Do you recall your solution?

I followed my friend's suggestion and hard fix the bucketize like below (this is the else-clause in get_pitch_embedding and get_energy_embedding ). I dont have deep knowledge in this so this is pure trial and error, tell me if this is wrong.

prediction = prediction * control buck = torch.zeros_like(prediction) buck[:] = 255 buck = buck.type(torch.long) buck.to(torch.device("cuda" if torch.cuda.is_available() else "cpu")) embedding = self.pitch_embedding(buck)

EuphoriaCelestial commented 2 years ago

I think it's a branch on training vs. inference, correct?

no, take a look at forward function in the model, there is many if-else clause inside, when I convert to ONNX, it say it unable to trace the data flow through them, the result maybe wrong.

In which case, one could generate separate models for the two cases. Is this what you are referring to, when you talk about "2 converted models"?

no, the 2 models I am mentioning is Fastspeech model and vocoder model (HiFiGAN or MelGAN), currently I only convert vocoder model

FasoCA commented 2 years ago

I followed my friend's suggestion and hard fix the bucketize like below (this is the else-clause in get_pitch_embedding and get_energy_embedding ). I dont have deep knowledge in this so this is pure trial and error, tell me if this is wrong.

I see, so the idea is to replace bucketize with a dummy tensor of equivalent size and type in the call to self.pitch_embedding and self.energy_embedding, when the onnx graph is generated. Makes sense, I'll give it a try, thank you!

Pydataman commented 2 years ago

mark

Tian14267 commented 2 years ago

@EuphoriaCelestial So ,did you covert to TRT successful ? I get some problem from onnx to TRT . My error is Error Code 4: Internal Error (Network must have at least one output)

EuphoriaCelestial commented 2 years ago

@EuphoriaCelestial So ,did you covert to TRT successful ? I get some problem from onnx to TRT . My error is Error Code 4: Internal Error (Network must have at least one output)

sadly, no. I can make it run successful with no error pop up, but the sound generated is only contain noise, the run time is not even reduced, so it a total failure.

Tian14267 commented 2 years ago

@EuphoriaCelestial So ,did you covert to TRT successful ? I get some problem from onnx to TRT . My error is Error Code 4: Internal Error (Network must have at least one output)

sadly, no. I can make it run successful with no error pop up, but the sound generated is only contain noise, the run time is not even reduced, so it a total failure.

Maybe it's the precision. So can you share ur method in onnx to TRT ? I really want to figure it out . Thank you very much.

Tian14267 commented 2 years ago

@EuphoriaCelestial I am sorry to disturb you, but I have some question .How did you solve the dynamic input in Fastspeech2 ?I give difference input ,but the output of onnx model is problematic.

lucasjinreal commented 2 years ago

It's possible convert encoder and decoder to onnx separately, but the middle VarianceAdaptor not able to integrate, does anyone sucessfully converted whole part into single onnx?

Tian14267 commented 2 years ago

It's possible convert encoder and decoder to onnx separately, but the middle VarianceAdaptor not able to integrate, does anyone sucessfully converted whole part into single onnx?

what do you mean whole part into single onnx? covert acoustic model and vocoder into one onnx ?, or a single acoustic model ?

lucasjinreal commented 2 years ago

@Tian14267 I have converted fastspeech to onnx. Does anyone able to convert this model for tensorrt inference?

leslie2046 commented 2 years ago

mark

mollon650 commented 2 years ago

@jinfagang can u show your code how to conver the model to onnx, thanks

javileyes commented 4 months ago

It's possible convert encoder and decoder to onnx separately, but the middle VarianceAdaptor not able to integrate, does anyone sucessfully converted whole part into single onnx?

@lucasjinreal do you be so kindful to share your translation code and your wise?