mistakes in voicing assimilation

lukyjanek / phonetic-transcription

Rule-based approach to the phonetic transcription of the Czech, Slovak and Polish languages into the International Phonetic Alphabet (IPA).

GNU General Public License v3.0

7 stars 6 forks source link

mistakes in voicing assimilation #1

Open jakubbortlik opened 3 years ago

jakubbortlik commented 3 years ago

Hi Lukáš, just in case some non-Czech people read this I'm writing in English. I've came upon your script and found some bugs. I don't know how to create a pull request here, so I forked your repo and made some changes in the branch "voicing_fix": https://github.com/jakubbortlik/phonetic-transcription/tree/voicing_fix. You might want to include the fix in your own code. Apart from the mistakes in Czech, I also found some issues in the Polish transcription (some mistakes in voice assimilation of "rz" (just write a test for "przemek" which should be "p ʂ ɛ m ɛ k" not "b ʐ ɛ m ɛ k". Also, some place assimilation is not correct, e.g., "napęd" is not "n a p ɛ ŋ t" but rather "n a p ɛ n̪ t", i.e., the "n" is not velarized but dentalized, in fact the whole n_nasals list should be split into at least two different lists according to place of articulation, check out our overview of Polish pronunciation at http://polfon.upol.cz/ :).

lukyjanek commented 3 years ago

Hi Jakub,

Thank you very much for your comments! I apologize for the late response, I did not check the project for a long time.

I agree with you and accept the corrections you proposed. I fixed the bugs in Czech transription of "ch" and "x" in Czech and the assimilation of "rz" in Polish.

Your project on Polish pronunciation is great! Thank you for this reference. :-)

Unfortunatelly, the correction of place assimilation of Polish "ą" and "ę" remains open. I've studied the project you referred to, but, for me, the situation seems as catching exceptions of exceptions from exceptions... The current implemented solution in phon_polish.py is taken from "Mówię po polsku" by Zofia Tarajło-Lipowska (2008). If you have better ("more correct/precise") implementation, feel free to contact me. I am ready to work on improvements in these scripts.

pkruczynski commented 2 years ago

Final devoicing correctly preserved in the word Bóg (God): echo -e "Bóg" | python3 phon_polish.py b u k :+1:

Omitting the end of the word Bóg and the resulting final devoicing in the context of the whole sentence: echo -e "Bóg jest źródłem prawdziwej miłości, wiary i życia" | python3 phon_polish.py b u ɡ :-1: j ɛ z d ʑ r u d w ɛ m p r a v d͡ʑ i v ɛ j m i w ɔ ɕ t͡ɕ | v i a r ɨ ʔ i ʐ ɨ t͡ɕ a

I found real bug ( ͡° ͜ʖ ͡°)

adam-sierakowski commented 4 months ago

Hello!

I am new to GitHub and I don't know if this is the right place to post this comment.

Just wanted to say that even though those issues are still around, your code was of much use for me. Even though I had to manually correct all the output, it saved me a ton of work. It was much faster than doing the entire transcription from scratch and for data protection reasons I couldn't use any of the online tools (but even if I did, I would have to proofread it too).

I would recommend your code to anyone in my situation, who is trained in Polish phonology and has the skill and time to proofread the output transcriptions