Open henriklied opened 9 months ago
Hey @henriklied ! Thank your for sharing your dataset. I assume phoneme (for instance coming from phonemizer) and EnCodec as inputs. Next iteration will contain instructions, I advise you not wasting your time now. Also keep in mind that my demo as been trained on long samples from librivox ~25s, it helps a lot for expressiveness.
Thanks for getting back to me @theodorblackbird!
I look forward to some more details and instructions around how to try this out. :-)
I am also interested in training on a custom dataset. Would anyone have instructions on steps to train?
Hi Theodor, this project looks very interesting!
I would really like to try this out on the Norwegian NST dataset.
Can you give me some pointers as to what kind of processing I'd have to do in order to mimic the dataset structure you're using?