Closed ultrasound1372 closed 4 years ago
I propose the word solenoid (/ˈsoʊlənɔɪd/ according to Wikipedia, this may be British IPA. IDK, can't read it. I think it's something along the lines of soalenoid?
Here are some entries, some have a comment before them starting with # to explain why they're there.
beautician `[.0byu.1tIS.0xn]
# Alex, host of Jeopardy!
trebek `[.0trX.1bEk]
# The service you use to find info about domain registrations
whois `[.1hu.0Iz]
# Nick
giannak `[.1JAn.0Xk]
# Classic interactive fiction/text adventure company from the 80's. Responsible for Zork and many others
infocom `[.1Info.0kam]
# One of Mars's moons
phobos `[.1fo.0bos]
sndup `[.0Es.0En.0di.1Hp]
ncaa `[.0En.0si.0dHbxl.1e]
naacp `[.0En.0dHbxl.0e.0si.1pi]
# A trigger fish from Hawaii
humuhumunukunukuapua'a `[.0humx.0humx.0nukx.0nukx.0ap.0u.0a.1a]
kitchensinc `[.1kIC.0xnz.1IGk]
# Russian special forces
spetsnaz `[.0spEtz.1nyaz]
# A kind of pasta
penné `[.0pxn.1e]
textfiles `[.1tEkst.0fYlz]
textfile `[.1tEkst.0fYl]
# Abbreviation for University of Louisville
uofl `[.0yu.0xv.1El]
# A games company from the 80's
broderbund `[.1brod.0R.0bHnd]
thermocouple `[.1TR.0mx.0kH.0pxl]
# Jonathan
mosen `[.1mo.0zxn]
mushroomfm `[.1mHS.0rum.0Ef.1Em]
# The CPU used in the Apple II and others
6502 `[.0sIks.0ti.0fYv.0o.1tu]
werewolf `[.1wer.0wxlf]
fjord `[.1fyord]
adolf `[.1ed.0clf]
adolph `[.1ed.0clf]
freedomscientific `[.0fri.0dxm.0sY.0xn.1tIf.0Xk]
# Software maintenance agreement
sma `[.0Es.0Em.1e]
callsign `[.1kcl.0sYn]
openvpn `[.1o.0pxn.0vi.0pi.1En]
aes `[.0e.0i.1Es]
sos `[.0Es.0o.1Es]
cpu `[.0si.0pi.1yu]
gpu `[.0Ji.0pi.1yu]
edelweiss `[.1ed.0xl.0vYs]
# A temperature/humidity sensor
sensorpush `[.1sEn.0sR.0pUS]
# A city in North Carolina
durham `[.1dU.0rxm]
runaround `[.1rHn.0x.0rWnd]
# A medicine
flonase `[.1flo.0nez]
# A certain man with two heads from the Hitchhiker's Guide to the Galaxy
zaphod `[.1ze.0fad]
beeblebrox `[.1bi.0bxl.0braks]
# Another guy from the same.
slartibartfast `[.0slar.0ti.1bart.0fAst]
# Regrettably your planet is one of those scheduled for demolition by these guys
vogon `[.1vo.0gan]
thermostats `[.1TR.0mx.0stAts]
walmart `[.1wcl.0mart]
kroger `[.1kro.0gR]
meijer `[.1mY.0R]
# The guy who invented the Step by Step phone switch
strowger `[.1stro.0JR]
# Speech synthesis software for the Apple II
textalker `[.1tEks.0tckR]
# An operating system for the Apple II
prodos `[.1pro.0das]
probraille `[.1pro.0brel]
# One of at least sixteen possible spellings of this word.
hanukkah `[.1han.0xkx]
siriusxm `[.1sir.0i.0xs.0Eks.1Em]
xmradio `[.1Eks.1Em.1red.0i.0o]
tunein `[.1tun.0In]
amazonsmile `[.1Amx.0zan.1smYl]
fergie `[.1fR.0gi]
# Weird Al
yankovic `[.1yAGk.0x.0vIk]
# A breed of dog
dachshund `[.1daks.0xn]
# A city in Kentucky
louisville `[.1lu.0x.0vxl]
budweiser `[.1bHd.0wY.0zR]
# A language spoken in Africa
swahili `[.0swa.1hi.0li]
# From StarTrek TOS
uhura `[.0u.1hu.0rx]
rachmaninoff `[.0rak.1ma.0nx.0naf]
tchaikovsky `[.0CY.1kaf.0ski]
# An electric toothbrush
sonicare `[.1sanX.0ker]
# This streaming service would be broadcasting the Barbershop Harmony Society's international convention this week, except it's been canceled due to Coronavirus
flovoice `[.1flo.0vOs]
adeline `[.1Ad.0x.0lYn]
# A text editor
edsharp `[.1Ed.0Sarp]
boucher `[.0bu.1Se]
deauthorize `[.0di.1cT.0R.0Yz]
euterpe `[.0yu.1tR.0pi]
nomorobo `[.0no.0mo.1ro.0bo]
nuc `[.1nUk]
oldschool `[.1old.0skul]
rsgames `[.1ar.1Es.1gemz]
# Bruce
toews `[.1tevz]
videogame `[.1vI.0dio.0gem]
imap `[.1Y.0mAp]
mbox `[.1Em.0baks]
sqlite `[.0Es.0kyu.1lYt]
mysql `[.0mY.0Es.0kyu.1El]
rsa `[.0ar.0Es.1e]
directv `[.0dR.1Ek.1ti.1vi]
nlsbard `[.0En.0El.0Es.1bard]
wifi `[1wY0fY]
sidharth `[1sId0harT]
anneke `[1an0X0kx]
mbraille `[1Em0brel]
# A rather famous virus
coronavirus `[.0kR.1o.0nx.0vY.0r.0xs]
brigham `[.1brI.0gxm]
ourself `[.0Wr.1sElf]
webinar `[.1wEb.0X.0nar]
hourglass `[.1WR.0glAs]
mailto `[.1mel.0tu]
icann `[.1Y.0kAn]
godaddy `[.1go.0dAdi]
deregister `[.0di.1rE.0JX.0stR]
# A famous poet
yeats `[.1yets]
lexicon `[.1lE.0ksX.0kan]
palmolive `[.0pam.1al.0Xv]
csun `[.1si.0sHn]
tbrnlive `[.0ti.0bi.0ar.0En.1lYv]
winamp `[.1wIn.0Amp]
xmplay `[.0Eks.0Em.1ple]
ibmtts `[.0Y.0bi.0Em.1ti.1ti.1Es]
flickr `[.1flI.0kR]
tumblr `[.1tHm.0blR]
komplete `[.0kxm.1plit]
kontrol `[.0kxn.1trol]
qatar `[.0kx.1tar]
istanbul `[.0Ist.0an.1bUl]
retinitis `[.0rE.0?N.1Y.0tXs]
pigmentosa `[.0pIg.0mXn.1to.0sx]
dakotan `[.0dX.1ko.0txn]
macular `[.1mAk.0yx.0lR]
stroganoff `[.1strog.0xn.0cf]
squawk `[.1skwak]
# John
clower `[.1klW.0R]
ibm `[.0Y.0bi.1Em]
aph `[.0e.0pi.1eC]
plugin `[.1plH.0gIn]
xbox `[.1Eks.0baks]
covid `[.1ko.0vXd]
gramophone `[.1grA.0mx.0fon]
animorphs `[.1An.0X.0morfs]
sewer `[.1su.0R]
sewage `[.1su.0XJ]
alienation `[.0e.0li.0xn.1eS.0xn]
wellbeing `[.0wEl.1bi.0IG]
miniscule `[.1mI.0nx.0skyul]
emojis `[.0I.1mo.0Jiz]
murderbot `[.1mxr.0dxr.0bat]
postmate `[.1post.0met]
delphi `[.1dEl.0fY]
munawar `[.1mu.0nx.0war]
whatnot `[.1wHt.0nat]
refill `[.1ri.0fIl]
I propose the word solenoid (/ˈsoʊlənɔɪd/ according to Wikipedia, this may be British IPA. IDK, can't read it. I think it's something along the lines of soalenoid?
Done (849ae32). The default appears to be British pronunciation, weirdly.
@jaybird110127 Wow, this is brilliant, thanks for your the submissions! Have to check standard pronunciations for 1 or 2 things, and will leave those abbreviations for their own dictionary for now, but most of these should be in there. Keep an eye out and open an issue if you notice something amiss.
Thanks for taking the time!
I say we should exclude 6502, uofl, that should be an NVDA dictionary thing since there's so many and they may overlap, yankovic and louisville already work, possibly others. And NAACP is already taken care of by eloquence, you're supposed to capitalize those acronyms. NCAA also works. Most abbreviations should be capitalized.
Oh, Louisville is different there. I'm not sure how widespread that is. Do we know other states with Louisvilles to compare it to? Also I'm not sure what to do with cpu and gpu, as they're supposed to be CPU and GPU, in uppercase, and actually getting multi-word SPR's in the roots is a royal pain in the ass, see idk and others.
yankovic and louisville already work
Louisville is being researched right now, but he wasn't wrong about Yankovic (/ˈjæŋkəvɪk/ - YANK-a-vik).
Adds most of the non-abbreviation entries from @jaybird110127, (187ebf2) alterations below. Thanks!
Proposing hallucinogen. In my part of the US, which isn't the region it covers but this might be the same pronunciation, it would be something like `[h0xl1usXnx2Jxn]. Maybe some switching on those schwas, though, like should the last one right before the n be the ih shaped or uh shaped one.
Dictionaries.zip Guys, to say that you have turned my 15-year-old hobby into a serious undertaking would be a huge understatement! I've been correcting Eloquence mispronunciations for the last 15 years, and actually have a hefty Root.dic file here. Older versions of my file used to ship with Nuance TALKS&ZOOMS and the Kurzweil 1000 which are both very much considered to be defunct (sadly of course). At any event, as an NVDA user, I have been correcting Eloquence mispronunciations for my own pleasure and managed to share the Root file with many users. I'm attaching my Root and Main files to this message. My question is how we can merge my gigantic root file with the one being maintained here. Of course, I just took a look at the Root file provided as part of this project, and noticed that both files have many common words. One final point is that since I can't get IBM TTS 20.07-x0_personal to detect ABBR.dic, I'm using Main.dic for abbreviations. Cheers.
@amirsol81 Hi there. It'd be difficult to exaggerate just how helpful you've been to the project, so I won't try. I will, however, say a huge thanks, of course! Fifteen years of customisations is nothing to turn one's nose up at, and it'd be absurd not to include your hard work here!
I'll need to write a program that can compare these files and print the differences, but be sure you'll see your work incorporated into (and credited in) a near-future release, hopefully the next on 08-01.
Blown away by this, thanks again, and hoping you're well!
I'll look into not loading ABBR.dic, though do keep in mind that you have to have abbreviations turned on in the synth. Also the latest IBMTTS compiles are only loading root.dic, main.dic and abbr.dic as legacy filenames, they will begin prefering enuroot.dic, enumain.dic and enuabbr.dic, which is for compatibility with another driver, though unfortunately they haven't yet had this same support added for other languages. If you could please upload your dictionaries again but with the abbreviations in abbr.dic that would be great.
Is this the same dictionary set referenced by Spivey and his add-on that automatically updates them?
@amirsol81 Hi there. It'd be difficult to exaggerate just how helpful you've been to the project, so I won't try. I will, however, say a huge thanks, of course! Fifteen years of customisations is nothing to turn one's nose up at, and it'd be absurd not to include your hard work here!
I'll need to write a program that can compare these files and print the differences, but be sure you'll see your work incorporated into (and credited in) a near-future release, hopefully the next on 08-01.
Blown away by this, thanks again, and hoping you're well!
Thanks for your kind words! I'm glad my love of language and my attention to one of the best TTS engines ever produced finally paid off! I'd be ecstatic to contribute to the project, and am positive we'll have awesome discussions regarding word pronunciations in the near future. All the best, Amir
I'll look into not loading ABBR.dic, though do keep in mind that you have to have abbreviations turned on in the synth. Also the latest IBMTTS compiles are only loading root.dic, main.dic and abbr.dic as legacy filenames, they will begin prefering enuroot.dic, enumain.dic and enuabbr.dic, which is for compatibility with another driver, though unfortunately they haven't yet had this same support added for other languages. If you could please upload your dictionaries again but with the abbreviations in abbr.dic that would be great.
Hmm. This is a bit baffling. No matter what I do, I can't get IBMTTS to detect and use ENUabbr.dic or Abbr.dic. The same variations work well for Root and Main, but Abbr never gets detected. I do have that Abbreviation check box checked in NVDA's Voices dialog. Of course, your decision regarding the use of ENU as part of the file names is quite welcome as it means I won't need to alter the file names. Kurzweil 1000's SAPI 4 Eloquence works with the ENU prefix, and I use that version of Eloquence and its provided Eloq.exe to correct mispronunciations. As for Abbr.dic/ENUAbbr.dic, anything I can try? Thanks.
Is this the same dictionary set referenced by Spivey and his add-on that automatically updates them?
I don't know. But I have updated mine over the past couple of weeks quite significantly. Even new words like "allowlist" are included.
I'll look into not loading ABBR.dic, though do keep in mind that you have to have abbreviations turned on in the synth. Also the latest IBMTTS compiles are only loading root.dic, main.dic and abbr.dic as legacy filenames, they will begin prefering enuroot.dic, enumain.dic and enuabbr.dic, which is for compatibility with another driver, though unfortunately they haven't yet had this same support added for other languages. If you could please upload your dictionaries again but with the abbreviations in abbr.dic that would be great.
I'm still trying to get ABBR/ENUABBR to work, but apparently to no avail. Even Eloquence SAPI 4 which ships with the K1000 can't detect its entries. So I haven't been able to transfer abbreviations from ENUMain.dic to ENUAbbr.dic. But attached is a ZIP file containing my latest dictionaries -- I added some entries to both files over the last couple of hours. Eloq ENU Dictionaries.zip
Yes, that addon does reference this dictionary. However, I am kind of concerned about this root dictionary and merging it. One of the main issues I encountered with it in quite a lot of instances where either your dialect was inserted, or pronunciations just made things sound weird or were were a bit wrong, for example Adidas. It seems that the original German pronunciation is [.1a.0Fi.2das], not
[.1H.0FX.2dHs].
Yes, that addon does reference this dictionary. However, I am kind of concerned about this root dictionary and merging it. One of the main issues I encountered with it in quite a lot of instances where either your dialect was inserted, or pronunciations just made things sound weird or were were a bit wrong, for example Adidas. It seems that the original German pronunciation is
[.1a.0Fi.2das], not
[.1H.0FX.2dHs].
Well, as far as English is concerned, I can't have a so-called dialect bias because I'm an Iranian with an M.A. degree in TESL. But I love American English and consider myself an avid NPR/Public Radio listener. I also check top US dictionaries such as AHD5, Merriam-Webster, Dictionary.com and Lexico (an offshoot of Oxford) when I want to make pronunciation modifications. As for Adidas, since I wanted to alter the popular American pronunciation which is quite different from the original and since it's not defined in any major American English dictionary, I consulted a number of online sources including the following to select the one you see in the file: https://www.insider.com/how-to-pronounce-adidas-2016-12#:~:text=It's%20pronounced%20%22AH%2Ddee%2D,emphasis%20on%20the%20first%20syllable.&text=The%20brand%20is%20derived%20from,Ah%2DDEE%2Ddus.%22 That said, I just provided my files here in case they can help the project move forward more promptly. There will be nothing wrong with not including them or modifying/removing the words people might deem erroneous.
Yes, that addon does reference this dictionary. However, I am kind of concerned about this root dictionary and merging it. One of the main issues I encountered with it in quite a lot of instances where either your dialect was inserted, or pronunciations just made things sound weird or were were a bit wrong, for example Adidas. It seems that the original German pronunciation is
[.1a.0Fi.2das], not
[.1H.0FX.2dHs].
That's why the project is on GitHub! :-) so that we can collaborate on and discuss these things as they come up, and obviously we will need a merge process for such a large set of entries. But I don't see this as being a barrier.
Also, regarding words like Adidas, we are transcribing pronunciations for a general American dialect of English. While I agree with you and think you're right about the native (and more correct) German pronunciation) of the word, I personally believe our goal should be to remain faithful to the dialect, not each word's origins. We can discuss the things we find and do some research.
Yes, that addon does reference this dictionary. However, I am kind of concerned about this root dictionary and merging it. One of the main issues I encountered with it in quite a lot of instances where either your dialect was inserted, or pronunciations just made things sound weird or were were a bit wrong, for example Adidas. It seems that the original German pronunciation is
[.1a.0Fi.2das], not
[.1H.0FX.2dHs].That's why the project is on GitHub! :-) so that we can collaborate on and discuss these things as they come up, and obviously we will need a merge process for such a large set of entries. But I don't see this as being a barrier.
Also, regarding words like Adidas, we are transcribing pronunciations for a general American dialect of English. While I agree with you and think you're right about the native (and more correct) German pronunciation) of the word, I personally believe our goal should be to remain faithful to the dialect, not each word's origins. We can discuss the things we find and do some research.
I agree with you about adhering to a so-called standard dialect of American English for transcriptions, and I've mostly tried to stick to that. However, sometimes my linguistic ventures move me in the direction of what might result in Adidas, which, BTW has been defined in one English dictionary: https://www.ldoceonline.com/dictionary/adidas And my today's dictionary files are attached -- yet again. I was reading a scientific article on NPR and managed to correct many pronunciations. I'd also appreciate it if you could give me hints or suggestions regarding my ENUAbbr.dic issue.
@amirsol81 @ultrasound1372 knows more about how the driver's cross-compatibility works, perhaps he can help you.
As for dialect, I know what you mean when you say "so-called", lol. It's going to be difficult sometimes to agree on a generic pronunciation for American English, so I appreciate that you post your sources!
I love linguistic ventures, I go on them all of the time!
@amirsol81 @ultrasound1372 knows more about how the driver's cross-compatibility works, perhaps he can help you.
As for dialect, I know what you mean when you say "so-called", lol. It's going to be difficult sometimes to agree on a generic pronunciation for American English, so I appreciate that you post your sources!
I love linguistic ventures, I go on them all of the time!
So looking forward to receiving help regarding ABBR.dic. Good question regarding my sources. I use the following:
It was actually @Mohamed00 who made the change for that, in fact it was to be compatible with that K1000 SAPI4 driver. Maybe ECI 6.1 doesn't actually support loading these? Or we're passing it the wrong number? Or maybe it's wrongly formatted? I don't know. I'm no master of the API. And as far as the General American pronunciation of things goes, I think the exception that has been demonstrated is proper nouns, like names. Things like Uzbekistan/Pakistan, etc. But yeah, things like those will always be up for debate, as there may be an Americanization of the name and then we have to decide. I'll write up a python script at some point to prepare this kind of stuff for merging, going over a baseline and an extension dictionary and finding entries in one and not in the other and merging those, and printing some output regarding entries that are in both and showing their differences in SPRs. I think in the roots dictionary everything is supposed to be lowercase though, so I might use .lower on everything.
It does support loading the dictionaries, and at least here, enuabbr.dic seems to load properly in NVDA and SAPI4. For SAPI4, you need to put the dictionaries in your user account's folder in the Eloquence directory. If you don't see it, try registering the *.UIL files using regsvr32 while running as administrator.
It does support loading the dictionaries, and at least here, enuabbr.dic seems to load properly in NVDA and SAPI4. For SAPI4, you need to put the dictionaries in your user account's folder in the Eloquence directory. If you don't see it, try registering the *.UIL files using regsvr32 while running as administrator.
If possible, please kindly send me your working version of ENUAbbr or ABBR, or attach it to your next reply. Here both ENUMain and ENURoot work flawlessly with both NVDA and the K1000, but ABBR.dic doesn't no matter what I do. Maybe I'm using wrong rules there -- don't know. That's why I want to test both with your ABBR/ENUAbbr file. Thanks.
... And as far as the General American pronunciation of things goes, I think the exception that has been demonstrated is proper nouns, like names. Things like Uzbekistan/Pakistan, etc. But yeah, things like those will always be up for debate, as there may be an Americanization of the name and then we have to decide. I'll write up a python script at some point to prepare this kind of stuff for merging, going over a baseline and an extension dictionary and finding entries in one and not in the other and merging those, and printing some output regarding entries that are in both and showing their differences in SPRs. I think in the roots dictionary everything is supposed to be lowercase though, so I might use .lower on everything.
You are right about the use of lowercase letters in ENURoot.dic -- in fact, even if we paste or type all-uppercase words there, EloqTalk.EXE converts them into lowercase letters upon saving them in ENURoot.dic. Here I use EloqTalk.EXE, as part of the Eloquence SAPI 4 bundle, to correct and save mispronounced words. Its SPR from KEY and SPR from Root functions are invaluable.
Here's a quick test dictionary that works here, both with SAPI4 and NVDA. Note that you can't put phonemes or any other annotations in abbreviation dictionary entries. enuabbr.zip
Guys, I just took a closer look at the root.dic file provided in this project, and have the following comments:
[.2e.0vi.1e.2trIks] aviatrices=
[.2e.0vi.1e.0trX.0siz]Here's a quick test dictionary that works here, both with SAPI4 and NVDA. Note that you can't put phonemes or any other annotations in abbreviation dictionary entries. enuabbr.zip
Thanks! I'll take a look in a few minutes.
Here's a quick test dictionary that works here, both with SAPI4 and NVDA. Note that you can't put phonemes or any other annotations in abbreviation dictionary entries. enuabbr.zip
Thanks. It works. But it seems to me that you can'T say, correct WHO as W H O in ABBR.dic -- only "World Health Organization" is accepted. Based on this observation, I think most of my corrected abbreviations still belong to Main.dic as I simply want most of them spelled -- WAMU, WHO, and ACT, for instance.
Guys, I just took a closer look at the root.dic file provided in this project, and have the following comments:
- I'm a bit concerned about the inclusion of 2 or 3-letter words (like "ios" , "aly" or "os") there. I think they belong to Main.dic or even Abbr.dic. But more important than that, words like these, when corrected, tend to interfere with the pronunciation of other words. I've seen it many times myself so if you take a look at my own Root file, you might find some words whose pronunciations are correct but are included there as a result of the correction of some other words. By the same token, words like "nvda" and "nvaccess" should be moved to Main.dic as they can be better corrected/pronounced there.
- The file corrects the word "formulae" as `[1fcrmyx2le]. However, it needs no correction as the proper pronunciation is provided by Eloquence -- like "antennae".
- The words "aviatrix" and "aviatrices" are, IMHO, corrected erroneously -- or maybe some dialect variation is involved. I've corrected them myself and have looked them up in many dictionaries. So they should be corrected this way: aviatrix=
[.2e.0vi.1e.2trIks] aviatrices=
[.2e.0vi.1e.0trX.0siz]- The word "fijian" doesn't need a correction as the one Eloquence provides is listed first in many sources.
- A small point: I think the word "tranquiline" should be corrected this way -- I mean the "X" beofre "n" should be uppercase: `[.1trAG.0kwX.0lXn]
- This one might require lots of discussion, but I think, and based on my own extensive investigation, the word "hyundai" should be corrected this way: `[.1hHn.0de]
- And, last but not least, thanks for the file! I learned a lot from it, and have incorporated many of its terms into my own file. I'll send my own files later today.
[p1cntes]. However, "pontes", the plural of "pons", should be corrected this way:
[.1pan.2tiz].Folks, Having understood the ins and outs of ABBR.dic and having analyzed Root.dic provided by this project, I managed to create an Abbr.dic file with entries such as XV/XII. I also updated my Main.dic file with a few entries mostly taken from the Main.dic provided here. However, my Root.dic file received the most significant updates today. Some were taken from this project's Root.dic, and many more terms were added based on my own readings here and there. And while I was there, I managed to alter some of the Proper Nouns, including Adidas, to reflect the most generalized/Americanized pronunciations. I'm attaching all of them here. Eloq ENU Dictionaries - Jul 10.zip
Thanks.
New word. SUSE. This is actually pronounced [s1usx], not
[s1uz], according to this musical on SUSE's YouTube channel.
So @amirsol81 have you effectively completed the merge? And yes, acronyms like ios, nvda, nvaccess etc should be in main.dic. Those are leftovers from the very beginnings of this dictionary a few years ago when we didn't even know about how the main worked. Aly is a name. Can you name a place where it would cause a false replacement? I haven't seen any broken replacements of os myself, except when part of an acronym that is pluralized, ISOs, but wouldn't that remain the same if it weren't there? It probably goes in main, I might end up removing some of those anyways as the only reason they're there is because Mason and I originally put them there because he had started putting them there for people who erroniously did not capitalize properly. OS, NVDA, NVAccess etc. Edit: Do we know if the main dictionary will correctly replace words with apostrophes appended? For example Aly's dress? If so then we can probably start moing the names there, though that may leave open the time where the name itself is referenced in plural to refer to many people named that, "I know 5 meghans!" etc. Speaking of Megan, Meagan.
@ultrasound1372 I was a bit concerned about the influence of the
corrected word "aly" on words like "Italy". However, performing a
wildcard search in AHD 5 indicates that "aly" hasn't altered the
pronunciation of other words so it seems to be safe to maintain it.
As for moving some entries to Main.dic, I also agree with you. In
fact, I'm doing it here and have moved many such words to ENUmain.dic.
Unfortunately Main.dic doesn't accept apostrophes. On a rather similar
topic, I'm also wondering if Main.dic or Abbr.dic can allow us to fix
words like CEOs, SMAs, FAQs, etc. I know the singular forms can be
fixed, but it seems that taking care of plural forms with that
lowercase "s" is out of the question.
As for merging the Roots, over the last 2 days I added even more terms
from the Root file provided here to my Root file. I also used the
general American pronunciation for some of my terms which tended to
have more of a non-American flavor -- terms like Volkswagen. Adidas
has taught me interesting lessons
On 7/12/20, Colton Hill notifications@github.com wrote:
So @amirsol81 have you effectively completed the merge? And yes, acronyms like ios, nvda, nvaccess etc should be in main.dic. Those are leftovers from the very beginnings of this dictionary a few years ago when we didn't even know about how the main worked. Aly is a name. Can you name a place where it would cause a false replacement? I haven't seen any broken replacements of os myself, except when part of an acronym that is pluralized, ISOs, but wouldn't that remain the same if it weren't there? It probably goes in main, I might end up removing some of those anyways as the only reason they're there is because Mason and I originally put them there because he had started putting them there for people who erroniously did not capitalize properly. OS, NVDA, NVAccess etc.
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/thunderdrop/IBMTTSDictionaries/issues/1#issuecomment-657176169
BTW the project Root file has corrected the word "babel" whereas I
have retained the original. Speaking of such corrections, I have
corrected "rasputin" as [.0rAs.1pyu.0tXn] based on American Heritage 5th and Merriam-Webster, but the one provided by the project is
[r1aspu2tin] which sounds a bit non-English. What should we do with
those two?
On 7/12/20, Amir Soleimani amirsol@gmail.com wrote:
@ultrasound1372 I was a bit concerned about the influence of the corrected word "aly" on words like "Italy". However, performing a wildcard search in AHD 5 indicates that "aly" hasn't altered the pronunciation of other words so it seems to be safe to maintain it. As for moving some entries to Main.dic, I also agree with you. In fact, I'm doing it here and have moved many such words to ENUmain.dic. Unfortunately Main.dic doesn't accept apostrophes. On a rather similar topic, I'm also wondering if Main.dic or Abbr.dic can allow us to fix words like CEOs, SMAs, FAQs, etc. I know the singular forms can be fixed, but it seems that taking care of plural forms with that lowercase "s" is out of the question. As for merging the Roots, over the last 2 days I added even more terms from the Root file provided here to my Root file. I also used the general American pronunciation for some of my terms which tended to have more of a non-American flavor -- terms like Volkswagen. Adidas has taught me interesting lessons
. However, the project Root file still contains terms which I haven't yet moved to my own Root file -- words like painspren, hungerspren, gloryspren, liespren, exhaustionspren, etc., which sound German, Dutch or Finnish to me. Should I integrate them? I'd say we have 70 to 80 percent of the merge complete, but there's still some work to do on that regard. And speaking of the merge, I'll be sending my latest files to you later today. PS.: Meagan is also corrected via Root. On 7/12/20, Colton Hill notifications@github.com wrote:
So @amirsol81 have you effectively completed the merge? And yes, acronyms like ios, nvda, nvaccess etc should be in main.dic. Those are leftovers from the very beginnings of this dictionary a few years ago when we didn't even know about how the main worked. Aly is a name. Can you name a place where it would cause a false replacement? I haven't seen any broken replacements of os myself, except when part of an acronym that is pluralized, ISOs, but wouldn't that remain the same if it weren't there? It probably goes in main, I might end up removing some of those anyways as the only reason they're there is because Mason and I originally put them there because he had started putting them there for people who erroniously did not capitalize properly. OS, NVDA, NVAccess etc.
-- You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub: https://github.com/thunderdrop/IBMTTSDictionaries/issues/1#issuecomment-657176169
- The words "aviatrix" and "aviatrices" are, IMHO, corrected erroneously -- or maybe some dialect variation is involved. I've corrected them myself and have looked them up in many dictionaries. So they should be corrected this way: aviatrix=[.2e.0vi.1e.2trIks] aviatrices=[.2e.0vi.1e.0trX.0siz]
- The word "fijian" doesn't need a correction as the one Eloquence provides is listed first in many sources.
I understand and mostly agree with 3, my thinking was along the lines of "index"/"indices", "matrix"/"matrices", etc.
I have never heard anyone say Fijian the way Eloquence says it, and not just because I live near those islands. Most Americans I have heard say the word pronounce it to rhyme with "Aegean", sometimes with stress on the first syllable, rather than the second.
I do agree on "tranquiline".
However, the project Root file still contains terms which I haven't yet moved to my own Root file -- words like painspren, hungerspren, gloryspren, liespren, exhaustionspren, etc., which sound German, Dutch or Finnish to me. Should I integrate them? I'd say we have 70 to 80 percent of the merge complete, but there's still some work to do on that regard.
Absolutely everything that is in the existing project root should be included in the project going forward, unless it can cause technical error, like you mentioned above. Those specific words are from a fantasy series written in English by an American author. I wish we could have regexps in the dictionary!
Please let us know when you've finished what you consider to be a full merge, there are some omissions in this version and a few blatant errors in your file that I'd want to discuss and correct, but obviously when we're all on the same page. I say "a few blatant errors" because I haven't looked at every one of your twenty-five thousand-odd entries, but even at a glance, some things stick out to me as a linguist and fellow long-time user of Eloquence.
E.G. In general American English, the "a" in "water" is a /ɔ/ or sometimes a /ɔː/, transcribed in all cases as [c]. In the word waterbed, you have used an /ɑ/ (
[a]), which is incorrect.
"Linux" is, and always has been pronounced the way that eloquence does, with a schwa. This really isn't the controversy people think it is, here's a video of its inventor pronouncing it. In my opinion, the inventor of a word should be considered the definitive source on such things as pronunciation, until such time that the word becomes so commonly used that it naturally develops dialectal variation.
These may seem like picking, I'm sorry if they do. I honestly don't intend anything like that, these are just examples and things to think about. The first I encountered almost at the top of the dictionary, and it stuck in my mind. The second stuck out to me because I work closely with Linux systems and it was like a jab in the skull.
This merge may take a bit longer than expected, there are many words to go through, and sources to consider. I'm finding the dictionaries pretty consistent and functional so far, but perhaps more of a test drive is required. I know a few people who would chew on us pretty thoroughly for giving them lin-ucks.
Interestingly, in Eloquence 5, Linux use to be pronounced [l1InHks], but it seems IBM corrected it and the correction filtered down at some point.
On 7/12/2020 8:44 AM, Peregryn Winterwell wrote:
3. The words "aviatrix" and "aviatrices" are, IMHO, corrected erroneously -- or maybe some dialect variation is involved. I've corrected them myself and have looked them up in many dictionaries. So they should be corrected this way: aviatrix=[.2e.0vi.1e.2trIks] aviatrices=[.2e.0vi.1e.0trX.0siz]
- The word "fijian" doesn't need a correction as the one Eloquence provides is listed first in many sources.
I understand and mostly agree with 3, my thinking was along the lines of "index"/"indices", "matrix"/"matrices", etc.
I have never heard anyone say Fijian the way Eloquence says it, and not just because I live near those islands. Most Americans I have heard say the word pronounce it to rhyme with "Aegean", sometimes with stress on the first syllable, rather than the second.
I do agree on "tranquiline".
However, the project Root file still contains terms which I haven't yet moved to my own Root file -- words like painspren, hungerspren, gloryspren, liespren, exhaustionspren, etc., which sound German, Dutch or Finnish to me. Should I integrate them? I'd say we have 70 to 80 percent of the merge complete, but there's still some work to do on that regard.
Absolutely everything that is in the existing project root should be included in the project going forward, unless it can cause technical error, like you mentioned above. Those specific words are from a fantasy series written in English by an American author. I wish we could have regexps in the dictionary!
Please let us know when you've finished what you consider to be a full merge, there are some omissions in this version and a few blatant errors in your file that I'd want to discuss and correct, but obviously when we're all on the same page. I say "a few blatant errors" because I haven't looked at every one of your twenty-five thousand-odd entries, but even at a glance, some things stick out to me as a linguist and fellow long-time user of Eloquence.
E.G. In general American English, the "a" in "water" is a /ɔ/ or sometimes a /ɔː/, transcribed in all cases as |[c]. In the word waterbed, you have used an /ɑ/ (|[a]), which is incorrect.
"Linux" is, and always has been pronounced the way that eloquence does, with a schwa. This really isn't the controversy people think it is, here's a video https://www.youtube.com/watch?v=5IfHm6R5le0 of it's inventor pronouncing it. In my opinion, the inventor of a word should be considered the definitive source on such things as pronunciation, until such time that the word becomes so commonly used that it naturally develops dialectal variation.
These may seem like picking, I'm sorry if they do. I honestly don't intend anything like that, these are just examples and things to think about. The first I encountered almost at the top of the dictionary, and it stuck in my mind. The second stuck out to me because I work closely with Linux systems and it was like a jab in the skull.
This merge may take a bit longer than expected, there are many words to go through, and sources to consider. I'm finding the dictionaries pretty consistent and functional so far, but perhaps more of a test drive is required. I know a few people who would chew on us pretty thoroughly for giving them lin-ucks.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thunderdrop/IBMTTSDictionaries/issues/1#issuecomment-657217085, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADY4AYBJWI2IYWOGUWHP2ELR3GV2ZANCNFSM4OMPYM5Q.
Another issue I found with this dictionary. You've chosen to pronounce dos as `[.1duz], I suppose for things like dos and don'ts, however this breaks phrases like "DOS prompt" and "boot into DOS".
Another issue I found with this dictionary. You've chosen to pronounce dos as `[.1duz], I suppose for things like dos and don'ts, however this breaks phrases like "DOS prompt" and "boot into DOS".
You are right -- dos and don'ts. However, a couple of days ago I did the same thing to "DOS" in ENUmain.dic to take care of it. I'll be sending them here. ENUmain.dic adds a lot of flexibility to what we are doing.
@thunderdrop I fully agree with you regarding Linux as your argument is what I have long advocated myself. For Linux I decided to go with what American Heritage Dictionary 5 suggests, but I'll remove it. I'll also include "Fijian" the way you proposed although it is listed second in many dictionaries. Of course the way Eloquence pronounces the first syllable is problematic regardless of the suggested pronunciation. You kindly suggested a couple of more points which I haven't managed to digest, but I'll be taking a closer look at them shortly. Thanks.
@thunderdrop I see your point regarding "waterbed". However, it's a so-called "slip of the finger" I suppose because I'm well aware of the rule you mentioned and have frequently utilized it in my corrections. I do welcome your awesome analysis and observation, though.
Glad to see all the progress. I've been kinda hands-off lately, sorry about that, though I'm relieved that you're handling the merge yourself because I haven't written that python script yet :). So main allows for case stuff so DOS and dos get pronounced separately? Also can we do anything about don'ts? Default eloquence pronounces it with an a, when it should be an o, but I'm not sure about that apostrophe getting in the way. If it can be corrected via root make sure you only correct the plural form, if you try and correct the singular and have it propogate you'll break function word detection, as eloquence doesn't let us specify in the English dictionaries what part of speech a given word is.
I can confirm that Eloquence does allow apostrophes in root dictionary entries, found this out while I was correcting O'shaughnessy, though I learned that the o' is unnecessary in the root entry. Fixing don'ts shouldn't be an issue, and if it breaks a word we can just enter it and that variant will be overwritten.
On 7/12/2020 4:56 PM, Colton Hill wrote:
Glad to see all the progress. I've been kinda hands-off lately, sorry about that, though I'm relieved that you're handling the merge yourself because I haven't written that python script yet :). So main allows for case stuff so DOS and dos get pronounced separately? Also can we do anything about don'ts? Default eloquence pronounces it with an a, when it should be an o, but I'm not sure about that apostrophe getting in the way. If it can be corrected via root make sure you only correct the plural form, if you try and correct the singular and have it propogate you'll break function word detection, as eloquence doesn't let us specify in the English dictionaries what part of speech a given word is.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thunderdrop/IBMTTSDictionaries/issues/1#issuecomment-657273896, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADY4AYAQOSXALK3WIN7TRPLR3IPOVANCNFSM4OMPYM5Q.
Glad to see all the progress. I've been kinda hands-off lately, sorry about that, though I'm relieved that you're handling the merge yourself because I haven't written that python script yet :). So main allows for case stuff so DOS and dos get pronounced separately? Also can we do anything about don'ts? Default eloquence pronounces it with an a, when it should be an o, but I'm not sure about that apostrophe getting in the way. If it can be corrected via root make sure you only correct the plural form, if you try and correct the singular and have it propogate you'll break function word detection, as eloquence doesn't let us specify in the English dictionaries what part of speech a given word is.
Thanks. It's quite a challenge, but I'm trying to get it done. It is a bit complicated because if I add a term to the dictionary and it already exists, EloqTalk will remove both. In addition, some terms exist in the project Root which, to me, sound like either misspelled words or already correct and I'm leaving the decision to include them to you -- words like "misshape" and "misshape" which are correct and don't need to be included, the word "descarte" which has been added without its final "s", and a couple of British English spellings ending in "ise" which have been corrected by placing stress on the wrong syllable. As for Main.dic, it provides us with an extra layer of flexibility and can distinguish between uppercase and lowercase, and even capitalized words.
Thanks. It's quite a challenge, but I'm trying to get it done. It is a bit complicated because if I add a term to the dictionary and it already exists, EloqTalk will remove both. In addition, some terms exist in the project Root which, to me, sound like either misspelled words or already correct and I'm leaving the decision to include them to you -- words like "misshape" and "misshape" which are correct and don't need to be included, the word "descarte" which has been added without its final "s", and a couple of British English spellings ending in "ise" which have been corrected by placing stress on the wrong syllable. As for Main.dic, it provides us with an extra layer of flexibility and can distinguish between uppercase and lowercase, and even capitalized words.
I'd like to re-iterate that everything that was already in the project file should be in the project going forward, unless it causes a technical error. The misspellings are fully intentional, some were added from MUDs, books, and forums. Although you're right about the -ise words, I was running into some pronunciation inconsistencies between British and American English spellings, and fixed them badly. I never got around to doing them properly until now, so I can fix those.
I appreciate Main, and agree that it'll be more useful for us in many situations.
Also, Amir, I wanted to apologise if I have come across as condescending in public comments to you. I'm not meaning to tell you things you already know, but provide the information to anyone reading this who isn't so familiar with how phonetics work. I hope you can forgive a bit of the tedium, I just like to clarify things. Old habit from writing documentation. Thanks for all of your help!
Folks, the merge is effectively complete, and what a feat! 😀 All misspelled words are also included although I don't like them much as they might serve to mislead people who rely on something like Eloquence for learning. Anyway, before sending the dictionary files to the list, let me ask some questions:
[r1ifIl]. However, this just reflects its noun form. I myself had corrected it as
[.0ri.1fIl] to reflect the verb form and its inflections. Obviously the original Eloquence pronunciation is wrong so we should decide which pronunciation to add. I suggest to correct it for its verbal function but am open to suggestions.
If you have any words you would like added to the root dictionary, please comment below. If the word has US and UK spellings that are different but should be pronounced the same in American English, please include both. All entries if possible should be as a general American English accent would say it. Post your words here, and if you can applicable pronunciations. These can either be in eloquence direct phoneme format, or as some other descriptor. If you can't convey these properly but you know of another American English synthesizer that pronounces the word as intended, you can reference that and we will extrapolate the phonemes from that. Although if the word is common enough but just hasn't been caught yet, we should most likely know how to handle it, thus intended pronunciations are optional. Note that the root dictionary is encoded in an ANSI codepage, for English Windows 1252, so along with Eloquence's rules about the contents of entries keep in mind that only characters in this set can be part of entries. Thus some fancy accented letters that are not in 1252 but are in Unicode may have difficulty being added, depending on how the encoding handler replaces them.