Open rbracco opened 4 years ago
Hi Robert,
sorry for the late reply. We had to create the dictionary manually because no such thing existed. See the associated paper on phonics engine. https://www.researchgate.net/publication/280147388_Building_a_Phonics_Engine_for_Automated_Text_Guidance
The process was starting with a dictionary with pronunciation at word level and then creating a bunch of equivalence rules to automatically do the matching. After that we had to manually review samples to see where additional rules were needed, lot of regexp work. We had plans to do more but didn't have funding.
Here are some examples of the rules we used: https://docs.google.com/document/d/1-DHwHyeaZwdo_ZjDSwWXe0TwE-GfdKJ7S2xyFsoyVkw/edit?usp=sharing
Dominik
Dominik Lukes http://dominiklukes.net @techczech
On Fri, Aug 7, 2020 at 8:23 PM Robert Bracco notifications@github.com wrote:
First off, thank you so much, you are the first person I've seen hosting a dictionary that corresponds graphemes to phonemes in common words. I have only been able to find word level transcriptions, but no linkage between combinations of letters and phones.
I was wondering, since this is quite a hard problem, what the origins of this dictionary are and how it was generated. Thank you!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/techczech/phonicsengine/issues/1, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG44C7YTCK3LI773L7JBSTR7RIE7ANCNFSM4PX5VTIQ .
Thanks so much for sharing. I work in machine learning and may try to build a model that does the letter/phone correspondence so that we could expand the dictionary to any word. If I make any progress I'll be sure to let you know.
Best, Rob
On Thu, Sep 24, 2020 at 2:19 AM techczech notifications@github.com wrote:
Hi Robert,
sorry for the late reply. We had to create the dictionary manually because no such thing existed. See the associated paper on phonics engine.
The process was starting with a dictionary with pronunciation at word level and then creating a bunch of equivalence rules to automatically do the matching. After that we had to manually review samples to see where additional rules were needed, lot of regexp work. We had plans to do more but didn't have funding.
Here are some examples of the rules we used:
https://docs.google.com/document/d/1-DHwHyeaZwdo_ZjDSwWXe0TwE-GfdKJ7S2xyFsoyVkw/edit?usp=sharing
Dominik
Dominik Lukes http://dominiklukes.net @techczech
On Fri, Aug 7, 2020 at 8:23 PM Robert Bracco notifications@github.com wrote:
First off, thank you so much, you are the first person I've seen hosting a dictionary that corresponds graphemes to phonemes in common words. I have only been able to find word level transcriptions, but no linkage between combinations of letters and phones.
I was wondering, since this is quite a hard problem, what the origins of this dictionary are and how it was generated. Thank you!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/techczech/phonicsengine/issues/1, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AAG44C7YTCK3LI773L7JBSTR7RIE7ANCNFSM4PX5VTIQ
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/techczech/phonicsengine/issues/1#issuecomment-698138481, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALIBGAJ5F63CWYI7DH5D34LSHLQF5ANCNFSM4PX5VTIQ .
Thanks, that would be great. Keep me posted.
Dominik
On Thu, Sep 24, 2020, 15:44 Robert Bracco notifications@github.com wrote:
Thanks so much for sharing. I work in machine learning and may try to build a model that does the letter/phone correspondence so that we could expand the dictionary to any word. If I make any progress I'll be sure to let you know.
Best, Rob
On Thu, Sep 24, 2020 at 2:19 AM techczech notifications@github.com wrote:
Hi Robert,
sorry for the late reply. We had to create the dictionary manually because no such thing existed. See the associated paper on phonics engine.
The process was starting with a dictionary with pronunciation at word level and then creating a bunch of equivalence rules to automatically do the matching. After that we had to manually review samples to see where additional rules were needed, lot of regexp work. We had plans to do more but didn't have funding.
Here are some examples of the rules we used:
https://docs.google.com/document/d/1-DHwHyeaZwdo_ZjDSwWXe0TwE-GfdKJ7S2xyFsoyVkw/edit?usp=sharing
Dominik
Dominik Lukes http://dominiklukes.net @techczech
On Fri, Aug 7, 2020 at 8:23 PM Robert Bracco notifications@github.com wrote:
First off, thank you so much, you are the first person I've seen hosting a dictionary that corresponds graphemes to phonemes in common words. I have only been able to find word level transcriptions, but no linkage between combinations of letters and phones.
I was wondering, since this is quite a hard problem, what the origins of this dictionary are and how it was generated. Thank you!
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/techczech/phonicsengine/issues/1, or unsubscribe <
https://github.com/notifications/unsubscribe-auth/AAG44C7YTCK3LI773L7JBSTR7RIE7ANCNFSM4PX5VTIQ
.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub < https://github.com/techczech/phonicsengine/issues/1#issuecomment-698138481 , or unsubscribe < https://github.com/notifications/unsubscribe-auth/ALIBGAJ5F63CWYI7DH5D34LSHLQF5ANCNFSM4PX5VTIQ
.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/techczech/phonicsengine/issues/1#issuecomment-698390713, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAG44C47YTRL2AQJ3PUMU5DSHNLL3ANCNFSM4PX5VTIQ .
First off, thank you so much, you are the first person I've seen hosting a dictionary that corresponds graphemes to phonemes in common words. I have only been able to find word level transcriptions, but no linkage between combinations of letters and phones.
I was wondering, since this is quite a hard problem, what the origins of this dictionary are and how it was generated. Thank you!