Hey! I'm doing this work as well! BUT I'm in trouble with phoneme level encoder, I don't know how to get Phoneme-Level Mel properly (without for loop...). Do you have any idea about it ? Maybe just a pytorch/tensorflow function could do it? Thx a lot.
Hey! I'm doing this work as well! BUT I'm in trouble with phoneme level encoder, I don't know how to get Phoneme-Level Mel properly (without for loop...). Do you have any idea about it ? Maybe just a pytorch/tensorflow function could do it? Thx a lot.