yl4579 / PL-BERT

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions
MIT License
216 stars 39 forks source link

Possible bug in train.ipynb - num_vocab #40

Closed jav-ed closed 10 months ago

jav-ed commented 10 months ago

see: https://github.com/yl4579/PL-BERT/blob/main/train.ipynb

bert = AlbertModel(albert_base_configuration)
bert = MultiTaskModel(bert, 
                      num_vocab=max([m['token'] for m in token_maps.values()]), 
                      num_tokens=config['model_params']['vocab_size'],
                      hidden_size=config['model_params']['hidden_size'])

When you try to get the maximum embedding scalar value from the list: ([m['token'] for m in token_maps.values()], you might need to add the number 1 to it.

Let me explain why I think so. Imagine we would have 3 embedding scalar values or embedding ids, which would be [0,1,2] --> the length of this array would be 3, however, the max of this list would be 2. From what I can tell, the num_vocab should be chosen as 3 and not as 2, thus leng(list_Of_Embedding_Scalar_Values) should be preferred over max(list_Of_Embedding_Scalar_Values)

yl4579 commented 10 months ago

Yes I agree this could be a bug. I have fixed that. Thanks for your suggestion.