[Question] about `intersperse` function.

chep0k commented 1 year ago

Hi! During preprocessing, when add_blank is True in hparams, some weird intersperse function (here) intersperses an index, which is out of vocabulary bounds (item=len(symbols)), between each pair of adjacent tokens. My first guess was that this token plays the role of some pauses between tokens, as pause token was not presented in vocabulary. So while training, all pauses sift to this token. Then, as it's name state, I treated it as some blank token, which is needed to absorb all "noises" between adjacent tokens, as for other tokens to present more clear phonemes. There I thought it may also be used to learn transformations from one phoneme to another, which is not a part of any of two adjacant phonemes itself, but a separate part. but if so, why is it a common token for all gaps? So, what is the real purpose of this blank token? this question is more addressed to the authors, but any guesses are welcome. thanks in advance.

chnk58hoang commented 5 months ago

Can someone explain the real purpose of interperse function ? I'm confuse with it a little bit.

chep0k commented 2 months ago

Can someone explain the real purpose of interperse function ? I'm confuse with it a little bit.

as long as I have been working with GradTTS I treated the interspersed token, for which item argument stands in the respective function, as kind of "space" token, inserted between each two adjacent phonemes and denoting the amount of, say, "silence" between them which model should learn to pronounce. thus, each non-"space" token should be filled only with sound immediately relevant to this token, while all pauses, skips and spaces should be delegated to this token. moreover, in case of noisy data, all irrelevant (background) buzz could be fed into this tokens hence purging it from other actual phonemes. otherwise, id est if this "space" token is omitted, all the noise and silence would have no other choice but to be memorised as parts of actual phonemes tokens hence contaminating them.

jaywalnut310 / glow-tts

[Question] about `intersperse` function. #75