thunderdrop / IBMTTSDictionaries

A large, community-driven pronunciation dictionary for the IBMTTS speech synthesizer in American English
Creative Commons Zero v1.0 Universal
22 stars 9 forks source link

2023-07 - Roots #36

Closed ultrasound1372 closed 1 year ago

ultrasound1372 commented 1 year ago

Se Readme for contributing guidelines.

ghost commented 1 year ago

Hello I have some entries to add to the EnuRoot dictionary file. The entries are below.

morisseau   `[.1mor.0X.2so]
manamonloser    `[.1mAn.0x.2manluzR]
replit  `[.1rEp.0lXt]
repl    `[.1rEp.0xl]
ddos    `[.1di.2dcs]
shakoor `[.0Sx.1kur]
onoff   `[.1an.2cf]
chloie  `[.1klo.0i]
powerschool `[.1pWR.2skul]
verigon `[.1ver.0X.2gan]
obs `[.2obi.1Es]
blobsaver   `[.1blab.2sevR]
restream    `[.1ri.2strim]
vocabulary  `[.2vo.1kAb.0yx.2leri]
product `[.1pra.2FHkt]
esim    `[.1i.2s.2Im]
gianni  `[.1J.0i.1a.0ni]
hamid   `[.0hx.1mid]
luhanna `[.2lu.1an.0x]
gmetrix `[.2Ji.1mEt.2r.0I.2ks]
editbox `[.1EF.0X.2tbaks]
ratio   `[.1re.0Sio]
madgamer    `[.1mAF.2gem.0R]
sonos   `[.1so.0nos]
ivona   `[.2Y.1von.0x]
mckensie    `[.0mX.1kEn.0zi]
saveamon    `[.1sev.0x.2man]
pycache `[.1pY.2kA.2S]
artenisa    `[.2arh.0tx.1ni.0sx]
osmani  `[.2a.1zm.1a.0ni]
cornelius   `[.2kor.1nil.2i.0xs]
address `[.1A.2drEs]
offline `[.1cf.1lYn]
online  `[.1an.1lYn]
exit    `[.1Eg.2z.0X.2t]
mylee   `[.1mY.0li]
kyleigh `[.1kY.0li]
renova  `[.2rX.1no.0vx]
zollotech   `[.1zol.0o.2tEk]
imessage    `[.1Y.0mEs.0XJ]

I also have some entries to add to the EnuMain dictionary file as well. They are below.

romance `[.1rom.2Ans]
Romance `[.1rom.2Ans]
ROMANCE `[.1rom.2Ans]
classroom   class `0 room
Classroom   class `0 room
CLASSROOM   class `0 room
ibmeci  i b m e c i
Ibmeci  i b m e c i
IBMECI  i b m e c i
nvdaremote  NVDA remote
Nvdaremote  NVDA remote
NVDAREMOTE  NVDA remote
ibmtts  i b m t t s
Ibmtts  i b m t t s

Lastly, I have entries for the abbreviation dictionary. Here they are.

KB  kilobytes
MB  megabytes
GB  gigabytes
TB  terabytes
PB  petabytes

/

ultrasound1372 commented 1 year ago

@mad-gamer13 Please edit your comment to wrap each block of dictionary entries in a code fence, ```, because at the moment we cannot easily select them to add thanks to markdown ignoring newlines in non-code blocks. You could also try a pull request, although note that due to the case of the B actually mattering and eloquence not supporting that we'll probably exclude the abbreviation entries. Those are better done with an NVDA dictionary entry. Finally, we do have at least some of the forms of nvdaremote, although all-caps is not needed as no one actually writes it that way. Initial capital is probably needed, though.

ghost commented 1 year ago

@mad-gamer13 Please edit your comment to wrap each block of dictionary entries in a code fence, ```, because at the moment we cannot easily select them to add thanks to markdown ignoring newlines in non-code blocks. You could also try a pull request, although note that due to the case of the B actually mattering and eloquence not supporting that we'll probably exclude the abbreviation entries. Those are better done with an NVDA dictionary entry. Finally, we do have at least some of the forms of nvdaremote, although all-caps is not needed as no one actually writes it that way. Initial capital is probably needed, though.

How exactly do I wrap blocks of code? Do I put 3 grave accents at the beginning and end of code?

amirsol81 commented 1 year ago

@mad-gamer13 Thanks. I'm working on the entries and will comment once I send the commit.

amirsol81 commented 1 year ago

@mad-gamer13 Thanks again. I added many of your suggested entries to Root and Main, and as suggested by @ultrasound1372, the 2-letter entries for ENUabbr might mess around with other words, so they haven't been added. Now a few words about the entries which were not added to Root/Main:

  1. The original pronunciations for words like vocabulary, exit, address, ratio, restream, product, offline and online are acceptable according to most dictionaries, so they haven't been added. Also FYI, suggested pronunciations should have only one stressed/primary syllable in the form of .1.
  2. The words ddos and powerschool haven't been added because they hardly ever, if ever, appear in this lowercase form. If, however, you encounter powerschool in this form other than its website URL, let us know and I'll add it.
  3. We already had the word "Cornelius" in Root, but I modified its pronunciation to reflect your pronunciation which is more frequently used/heard.
  4. The entry "obs" apparently belongs to Main not Root.
  5. Finally, we already have nvdaremote, lowercase, in Main, and I haven't seen other forms of it used. The same is true about ibmtts and ibmeci, but if you encounter other forms of them, do let us know.
nicopn commented 1 year ago

Requests

Roots:

mpeg`[.1Em.2pEg]
opensource  `[.1o.0pxn.2scrs]
plist   `[.1pi.2lIst]

Main:

ffmpeg  FF `[.1Em.2pEg]
Ffmpeg  FF `[.1Em.2PEg]
FFmpeg  FF `[.1Em.2pEg]
Fmpeg   F `[.1Em.2pEg]

Note: The entry for "Fmpeg", with a single f, in main is intended to be for "FFmpeg", with NVDA's cammel case splitting behaviour taken into account ("FFmpeg" gets split as "F Fmpeg").

ultrasound1372 commented 1 year ago

Missing tab on mpeg. I think opensource and possible variations, which is probably only opensourced and maybe opensourcing which would have to be separate anyway, should be put in main, as there is a word boundary here. In either case the second syllable should probably be with a capital X, as used in the default pronunciation of open. Furthermore, I think the main form Ffmpeg, with two f's but only one capital, should be excluded as it's an incorrect form that one is not likely to encounter. Finally, would you also want a main entry for MPEG, in all caps, to be pronounced like this root entry, as is the case for JPEG? Or should the capitalized forms be spelled out, in which case JPEG should be removed?

amirsol81 commented 1 year ago

@nicopn Thanks. I just added your suggestions as part of the latest commit, and made a couple of minor changes in them. Also added the uppercase "MPEG" to Main to match the lowercase one.

nicopn commented 1 year ago

Requests:

numpy   `[.1nHm.2pY]
mondevol    `[.1man.0dX.2vcl]
scipy   `[.1sY.2pY]
bezier  `[.1bE.0zi.0e]
pytorch `[.1pY.2tcrC]
sinewave    `[.1sYn.2wev]
winget  `[.1wIn.2gEt]
kiribati    `[.1ki.0rX.2bAs]
ultrasound1372 commented 1 year ago

@nicopn Where have you encountered mondevol? I agree with most of these but looking at the transcription of that it looks like the last syllable should be given secondary stress.

nicopn commented 1 year ago

@ultrasound1372 Mondevol is the name of a Manamon, which is pronounced as I transcribed. Now that I think about it, the last cyllable should be given secondary stress. I've edited my previous comment to acount for this.

amirsol81 commented 1 year ago

@nicopn Thanks - I just added these with the latest commit.

nicopn commented 1 year ago

Requests:

dadjoke `[.1dAd.2jok]
plugintorrent   `[.1plH.0gIn.2tc.0rXnt]
roomtone    `[.1rum.2ton]
soundsource `[.1sWnd.2scrs]
tilemap `[.1tYl.2mAp]
ultrasound1372 commented 1 year ago

@amirsol81 What's with severus? Looking at the entries it's surrounded by it appears to have come in from some other list. For the Harry Potter character, I think the pronunciation would either be

`[.1sEv.0rxs]

or

`[.1sEv.0Rxs]

Adding secondary stress seems to break it strangely. I've heard it pronounced the second way in the films but perhaps for an American pronunciation the first would be acceptable? The first is what the engine gives by default with no dictionary in place.

ultrasound1372 commented 1 year ago

@nicopn Added in 75efc2e with a modification, j isn't a valid SPR.

amirsol81 commented 1 year ago

@ultrasound1372 The word "severus" was added to cover the following biographical name: Lucius Septimius severus: https://www.merriam-webster.com/dictionary/Severus However, if you feel the default is heard more frequently given the popularity of the Harry Potter character, it can be removed.

amirsol81 commented 1 year ago

@ultrasound1372 The default pronunciation for the word "advertisement," [.1Ad.0vR.2tYz.0mXnt], emphasizes the first syllable - a pronunciation which is listed nowhere in American or British dictionaries. The proper pronunciation for it is[.2Ad.0vR.1tYz.0mXnt], emphasizing the third syllable. This is the standard American English pronunciation. So I think it should be re-introduced.

ultrasound1372 commented 1 year ago

@amirsol81 Oh I see what happened, I was using elocutor as a test but it had a previous copy of the dictionary. You are correct.