xinjli / allosaurus

Allosaurus is a pretrained universal phone recognizer for more than 2000 languages
GNU General Public License v3.0
532 stars 85 forks source link

Issue with shapes alignment #37

Closed anushakabber closed 2 years ago

anushakabber commented 2 years ago

Hello! I was having an issue with fine-tuning the model. This is the error message I'm getting : image I'm not sure how to proceed. Any insight would be greatly appreciated, thank you!

xinjli commented 2 years ago

Hi,

I am not sure what is happening here, can you tell me the language id you are targeting and share a couple of phone transcribed utterances so I can investigate?

Thanks!

anushakabber commented 2 years ago

The language id = 'tel', and these are two utterances from the text file under train-

000010007 a n ʊ ʂ a blk> t̪ a ɖ r ɪ blk> n a r s a j j a blk> k o n n e eː l ɭ l a blk> k r ɪ t̪ a blk> a n a aː r o oː g j a t̪ o oː blk> m r t̪ ɪ blk> tʃ e d̪ a aː ɖ ʊ 000010013 p r a t̪ j e eː k a blk> ɦ o oː d̪ a aː blk> k o oː s a blk> k e eː d̪ r a blk> p a aI n a blk> o t̪ t̪ ɪ ɖ ɪ blk> tʃ e eː j a l e eː k a blk> a b bh ɪ ʋ r d̪ d̪ d̪h ɪ blk> p e eː r ʊ t̪ o oː blk> m a n a blk> p a aː l a k ʊ l ʊ blk> ɪ t̪ a r a blk> d̪ e eː ʃ a aː l a blk> tʃ ʊ ʈ ʈ ʊ uː blk> t̪ ɪ r ʊ g ʊ t̪ ʊ n n a aː r a n ɪ blk> tʃ a d̪ r a b a aː b ʊ blk> p r a b bh ʊ t̪ ʋ a aː n n ɪ blk> e d̪ d̪ e eː ʋ a aː blk> tʃ e eː ʃ a aː r ʊ

The corresponding audio files -

https://user-images.githubusercontent.com/60822709/131215762-d0b98f58-f46f-4b67-bcfa-9aa51751597d.mp4

https://user-images.githubusercontent.com/60822709/131215775-c2d55277-03df-4f1f-b5aa-d5c5ca88e243.mp4

We also used our own phone directory - phones_telugu.txt

sahanashettigar commented 2 years ago

@xinjli Hey! Even I had the same question! Hope you can answer it soon:)

xinjli commented 2 years ago

Hi anusha2904, thanks for sharing the details, I will take a look at your data sahanashettigar, do you have the same problem for Telugu?

sahanashettigar commented 2 years ago

Hey @xinjli! I'm working with Kannada with IPA notation phones same as what Anusha shared and I got the same error. We both updated the phone inventory. The audios used in train and validation are not machine-generated but recorded by different individuals. Thus, the amount of silence in the audio clips may vary. We converted transcripts from a different notation to IPA using a label set that came with the transcripts.

xinjli commented 2 years ago

It looks that one of the dependency panphon is upgrading its feature recently to use dim 24 instead of previous dim 22, which causes the dim mismatch.

I updated the version and can you retry it? I think it should be fixed now.

Also, it looks there was a trivial bug in the phone inventory setup, please remove your from your phone list or you can list the phone again to remove the , it is better not to include it in the customized inventory.

Thanks for submitting the issue!