revsic / torch-nansypp

NANSY++: Unified Voice Synthesis with Neural Analysis and Synthesis
MIT License
139 stars 11 forks source link

MFA Feature for Timber Token Block #4

Closed pranavmalikk closed 1 year ago

pranavmalikk commented 1 year ago

I noticed in the paper that MFA feature had shape [B, 3072, N] when undergoing both MHA's in the Timber Token Block. This is confusing as the result of the MFA from the ECAPA-TDNN paper had shape 1536 x T. I know you left the dimensions for the MHA as 1536, but was wondering what was the insight about this part leaving the 1st dimension of the Conv 1D as 1536 instead of 3072.

https://github.com/revsic/torch-nansypp/blob/c8ef7fbba4a647c4b8dc115f4839579f9871e315/nansypp/timber.py#L274-L276

Screenshot from 2023-04-14 21-25-32