Open jadechip opened 2 months ago
Thank you @jeremy110, can I just clarify
I have switched the tones for the "_" characters to zeroes.
However I am a bit confused regarding the format of the word2ph list, I should note that I have switched the tokenizer in the g2p method, so the word mapping might be a bit different. Right now it uses the same tokenizer as the get_bert_feature
function, which I believe adheres more to the implementation of other languages.
As an example, the phrase ใครเป็นผู้รับ would be tokenized into the following chunks tokenized ['▁ใคร', 'เป็นผู้รับ'].
This results in a word2ph list that looks like this: [1, 3, 8, 1] where the ones are the underscore characters and the 3 = ใ ค ร, and the 8 = เ ป็ น ผ้ ู ร ั บ.
...and there is 13 tones (one assigned for each phoneme).
tokenized ['▁ใคร', 'เป็นผู้รับ']
Final phs: ['_', 'kʰ', 'r', 'aj', 'p', 'e', 'n', 'pʰ', 'uː', 'r', 'a', 'p̚', '_']
Final tones: [0, 2, 2, 2, 2, 2, 2, 5, 5, 3, 3, 3, 0]
Final word2ph: [1, 3, 8, 1]
len(phones) 13
len(tones) 13
Is this correct?
Hello @jadechip I think you can refer to the French section. Below is an example in French, so you can see that word2ph calculates by converting words into their IPA format phones. In it, sə- corresponds to 3, sɛʁvˈis corresponds to 7, ɡʁatyˈi corresponds to 7, and ɛt corresponds to 2.
French: Ce service gratuit est disponible en chinois simplifié et autres 123.
ipa: sə- sɛʁvˈis ɡʁatyˈi ɛt disponˈibl ɑ̃n ʃinwˈa sɛ̃plifjˈe e otʁz sˈɑ̃ vˈɛ̃ tʁwˈa.
phones: ['_', 's', 'ə', '-', 's', 'ɛ', 'ʁ', 'v', 'ˈ', 'i', 's', 'ɡ', 'ʁ', 'a', 't', 'y', 'ˈ', 'i', 'ɛ', 't', 'd', 'i', 's', 'p', 'o', 'n', 'ˈ', 'i', 'b', 'l', 'ɑ', '̃', 'n', 'ʃ', 'i', 'n', 'w', 'ˈ', 'a', 's', 'ɛ', '̃', 'p', 'l', 'i', 'f', 'j', 'ˈ', 'e', 'e', 'o', 't', 'ʁ', 'z', 's', 'ˈ', 'ɑ', '̃', 'v', 'ˈ', 'ɛ', '̃', 't', 'ʁ', 'w', 'ˈ', 'a', '.', '_']
tones: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
word2ph: [1, 3, 7, 7, 2, 10, 3, 6, 5, 5, 1, 4, 13, 1, 1]
In your case should be
Final phs: ['_', 'kʰ', 'r', 'aj', 'p', 'e', 'n', 'pʰ', 'uː', 'r', 'a', 'p̚', '_']
Final tones: [0, 2, 2, 2, 2, 2, 2, 5, 5, 3, 3, 3, 0]
Final word2ph: [1, 3, 3, 2, 3, 1]
@jadechip Hello there, how were your training results? Are you still struggling with the pronunciation?
Hi @maryne-ii, I believe the pronunciation issues should be resolved, however I am having some issues getting distributed training to work, this is the error I am getting:
terminate called after throwing an instance of 'gloo::EnforceNotMet'
what(): [enforce fail at ../third_party/gloo/gloo/transport/tcp/pair.cc:446] op.preamble.length <= op.nbytes. 874668 vs 80644
My understanding is the gloo library is used by PyTorch for collective communication in distributed training, so perhaps the error indicates some kind of mismatch between expected and actual sizes during TCP communication, but I am not sure what in my code - if anything - is causing this...
I should also note I have been using PyTorch > 2.x to train as I was getting other CUDA errors similar to #96.
Training on a single GPU seems to work though.
I've had some time to continue working on this and was able to resolve the training issues. I believe the inconsistencies were caused by the tokenizer I was using. I have now changed to a tokenizer that more aligns with the format expected by the codebase. The format is close to what @jeremy110 suggested apart from the underscore characters in the output. I am not sure if I should remove the underscore characters in the tokenized text before calculating the phs, tones and word2ph values? I am concerned it might cause inconsistencies with the get_bert_feature function later in the pipeline?
The tokenized text ['▁', 'กง', 'ล้อ']
Final phs: ['_', '▁', 'k', 'o', 'ŋ', 'l', 'ɔː', '_']
Final tones: [0, 2, 2, 2, 2, 3, 3, 0]
Final word2ph: [1, 1, 3, 2, 1]
bert features shape torch.Size([768, 8])
I think it can be removed because it duplicates the original underscores.
Ok, thank you, I will give this a shot.
I have created the following PR to add support for Thai language. I am in the process of creating a dataset to train the model but would love a PR review of the code first to make sure I am on the right track.
Thank you!
117