lucidrains / naturalspeech2-pytorch

Implementation of Natural Speech 2, Zero-shot Speech and Singing Synthesizer, in Pytorch
MIT License
1.26k stars 100 forks source link

add attention after 3rd conv and add compute pitch #15

Closed manmay-nakhashi closed 1 year ago

manmay-nakhashi commented 1 year ago

as written in paper: 10 Q-K-V attention layers for in-context learning, which have 512 hidden dimensions and 8 attention heads and are placed every 3 1D convolution layers.

manmay-nakhashi commented 1 year ago

closed as this is already done.

lucidrains commented 1 year ago

@manmay-nakhashi oh oop, didn't know you were working on this! thank you regardless