thunderdrop / IBMTTSDictionaries

A large, community-driven pronunciation dictionary for the IBMTTS speech synthesizer in American English
Creative Commons Zero v1.0 Universal
23 stars 10 forks source link

2023-06 - Main #25

Closed thunderdrop closed 1 year ago

thunderdrop commented 1 year ago

See readme for contributing guidelines.

MeisamAmini commented 1 year ago

These are some words that need to be added. I don't know how to write the pronunciation very well, so just wrote those I could and the to names that are from another language. ai ay igh usb yoo ess bee gamelit game lit isekai Meisam [.2me.1sAm] Amini[.2A.2mi.1ni]

amirsol81 commented 1 year ago

@MeisamAmini Thanks. Some of your suggestions were added to ENUmain and ENURoot. Some notes below:

  1. Normally we don't add lowercase variants to the dictionary unless they're widely used. As such, we haven't added "usb" and "ai".
  2. We added "Meisam" using a slightly different, Anglicized version, to ENURoot.
  3. As for "Amini," the original pronunciation sounds correct if we consider the Anglicized version - for English speakers. That's why it hasn't been added.
MeisamAmini commented 1 year ago

Understood. Thanks.

On Sun, Jun 4, 2023 at 10:44 AM amirsol81 @.***> wrote:

@MeisamAmini https://github.com/MeisamAmini Thanks. Some of your suggestions were added to ENUmain and ENURoot. Some notes below:

  1. Normally we don't add lowercase variants to the dictionary unless they're widely used. As such, we haven't added "usb" and "ai".
  2. We added "Meisam" using a slightly different, Anglicized version, to ENURoot.
  3. As for "Amini," the original pronunciation sounds correct if we consider the Anglicized version - for English speakers. That's why it hasn't been added.

— Reply to this email directly, view it on GitHub https://github.com/thunderdrop/IBMTTSDictionaries/issues/25#issuecomment-1575438465, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHARQ2RM3SK5ZAUK4PZHBH3XJQYU3ANCNFSM6AAAAAAVL7TK4M . You are receiving this because you were mentioned.Message ID: @.***>

nromey commented 1 year ago

Hello,

In root dictionary tabatha `[.1tA.bx.TH] In main: DLA Dee Ell Ay

These seem to work, in the case of the proper name, it expands properly for other forms i.e. plural and possessive.

Please let me know if I've screwed this up--this is my first submission. Next time I may attempt a pull.

Thanks.

NER

amirsol81 commented 1 year ago

@nromey Thanks for your great suggestions. I just added them to the project as part of the last 2 commits. Now a couple of notes:

  1. Tabatha. This is a variant of the word "Tabitha" - already in the dictionary, and the suggested form adds a lowercase "x" at the end instead of an uppercase "H:" `[.1tA.0bx.0Tx]. This is a general rule affecting word-ending syllables, ending in "a," which receive secondary/tertiary stress.
  2. DLA. For consistency, the replacement should contain lowercase characters instead of uppercase ones: dee ell ay. Looking forward to your upcoming submissions.
nromey commented 1 year ago

Ah cool. I had Tx as the last syllable and changed it. I’ll get my git straight and do a pull. So lower case in the special dictionary is case insensitive correct?  Asking will help me get you some better submissions. Ner On Jun 26, 2023, at 23:07, amirsol81 @.***> wrote: @nromey Thanks for your great suggestions. I just added them to the project as part of the last 2 commits. Now a couple of notes:

  1. Tabatha. This is a variant of the word "Tabitha" - already in the dictionary, and the suggested form adds a lowercase "x" at the end instead of an uppercase "H:" `[.1tA.0bx.0Tx]. This is a general rule affecting word-ending syllables, ending in "a," which receive secondary/tertiary stress.
  2. DLA. For consistency, the replacement should contain lowercase characters instead of uppercase ones: dee ell ay. Looking forward to your upcoming submissions.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

nromey commented 1 year ago

Got another question. DoD is pronounced do d in Nvda. Can or should I try to add dod as a lower case entry in the special dictionary? Doing it this way, will DoD as a mixed case, be spoken as d o d? I encounter this old all the time at work. On Jun 27, 2023, at 10:50, Noel Romey @.> wrote:Ah cool. I had Tx as the last syllable and changed it. I’ll get my git straight and do a pull. So lower case in the special dictionary is case insensitive correct?  Asking will help me get you some better submissions. Ner On Jun 26, 2023, at 23:07, amirsol81 @.> wrote: @nromey Thanks for your great suggestions. I just added them to the project as part of the last 2 commits. Now a couple of notes:

  1. Tabatha. This is a variant of the word "Tabitha" - already in the dictionary, and the suggested form adds a lowercase "x" at the end instead of an uppercase "H:" `[.1tA.0bx.0Tx]. This is a general rule affecting word-ending syllables, ending in "a," which receive secondary/tertiary stress.
  2. DLA. For consistency, the replacement should contain lowercase characters instead of uppercase ones: dee ell ay. Looking forward to your upcoming submissions.

—Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you were mentioned.Message ID: @.***>

amirsol81 commented 1 year ago

@nromey Yes - you're right. Lowercase letters in the special dictionary are case-insensitive. As for DoD, it seems to me that adding "dod," all lowercase, to the main dictionary can't fix it. It's because everything before the last letter, which is uppercase, gets interpreted as a separate word, and the uppercase "D" is interpreted as a separate entity. Of course, you may try it there and let us know if gets fixed.

nromey commented 1 year ago

No luck on DoD. At least I know how to fix it locally. As you have, I've been looking for duplicates and cleanup. Before I make the change, and yes I know I could just change it and you could just reject the pull, I found a "Da'Shaun" and a "Da'Shaun's" in the special (main) dictionary. I'm not sure if this is correct practice, but I'm going by experience here. Shouldn't we pu da'shaun in roots and it'd cover the posessive as well. Also, as you noted in my proper name fixes previously, I put this in lower case. Initial tests on my system are the this works, but I want to make sure I'm on the same page as you before I go and break junk. Then I'll continue to look for dupes. Really enjoying working on this project.

amirsol81 commented 1 year ago

No luck on DoD. At least I know how to fix it locally. As you have, I've been looking for duplicates and cleanup. Before I make the change, and yes I know I could just change it and you could just reject the pull, I found a "Da'Shaun" and a "Da'Shaun's" in the special (main) dictionary. I'm not sure if this is correct practice, but I'm going by experience here. Shouldn't we pu da'shaun in roots and it'd cover the posessive as well. Also, as you noted in my proper name fixes previously, I put this in lower case. Initial tests on my system are the this works, but I want to make sure I'm on the same page as you before I go and break junk. Then I'll continue to look for dupes. Really enjoying working on this project.

@nromey Thanks for your vested interest! You are absolutely right. I tested da'shaun, all lowercase, with ENURoot and it takes care of both instances in ENUmain. I was warried that the lowercase entry might fail with mixed case ones, but my concerns were unwarranted ! So, please kindly go ahead and take care of da'shaun and similar ones with your local tests - very much appreciated! My efforts were primarily aimed at duplicates and words with alternative pronunciations in ENURoot, but you've raised an interesting case. Thanks again for your interest and curiosity.

nromey commented 1 year ago

Will do. I thought it odd, but by reading the manual it appears as if the root dictionary is case insensitive. I understand why you might put this in main, but I found other words in there that had apostrophes, so I was like hey, let's give this a try. I also had a problem with Tabatha and apostrophes--it was reading the 's form as Tabatha s, so I fixed it by doing this very thing. Screwed up my fork so I'm re-creating and you should get a pull request when i get done. Will try to not pummel you with ones and twos. I'm new to working on an active github project so I'm slowly progressing to success. Keep up your efforts, you're doing great.

amirsol81 commented 1 year ago

@nromey Thanks. Do take your time and we'll take care of them together! 👏🏻👏🏻

ultrasound1372 commented 1 year ago

Since there is an active discussion going on here I'll leave this issue open for the moment and not change the title as we do.

nicopn commented 1 year ago

Requests:

opengl  open GL
Opengl  Open GL
openssl open SSL
Openssl Open SSL
Chatgpt Chat GPT
JSON    `[.1Je.2san]
macos   mac OS
Macos   Mac OS
ipados  `[.1Y.2pAd] OS
Ipados  `[.1Y.2pAd] OS
deadmau5    `[.1dEd.2mWs]
Deadmau5    `[.1dEd.2mWs]
Nvdaremote  NVDA remote

Note: The entries Chatgpt and Nvdaremote are uppercase versions of the already-existing entries for chatgpt and nvdaremote. Similarly, JSON is the all caps version of json which, in my opinion, should not be spelled out.

amirsol81 commented 1 year ago

@nicopn Thanks. I just added a few of your suggestions to Main as part of the latest commit. However,

  1. Do opengl, ipados and macos tend to appear in this form? I haven't seen them, so haven't added them either. But if you think they deserve to be added, let us know.
  2. It seems that NVDA can't properly pronounce deadmau5 because of the number at the end, so I haven't added it.
  3. Note that Nvdaremote and Chatgpt were added to the middle of the file next to their relevant entry.
ultrasound1372 commented 1 year ago

Discussion has concluded here, closing to be replaced by 2023-08.