Closed ultrasound1372 closed 1 year ago
@ultrasound1372 As someone who has done almost all of that
On 9/25/2020 9:03 PM, Colton Hill wrote:
I've noticed a trend developing in ENU Main giving special pronunciations for improperly cased versions of many websites and a few acronyms. As the pronunciation of the website domain name is generally not what you see as the title of the page, I vote for these pronunciations to be removed. As for acronyms, I'm not totally sure on that one, even for some lowercase versions. I believe the addition of pronunciations for domain names just makes the dictionary unnecessarily large and puts an undo burden on the contributors, as virtually every website in existence with a multi-word name would have to be added. This then produces a heavy bias on the part of the contributor since this goal is unattainable. An argument can be made about the bias of the populus rather than the contributor, for certain sites of things like new organizations, but I believe the domain names should be removed, instead focusing on actual words one will encounter in general text. If ECI has broken handling of all-caps acronyms, their existence is justified. cc @amirsol81 https://github.com/amirsol81 @thunderdrop https://github.com/thunderdrop
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thunderdrop/IBMTTSDictionaries/issues/13, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIGLZMXRHAG5ZXFJZPGKSZLSHTH5HANCNFSM4RZZPD4Q.
Perhaps we should begin perging the main dictionary of many of these domain names?
Interestingly, @thunderdrop added a domain name today
On 9/29/2020 9:02 PM, Colton Hill wrote:
Perhaps we should begin perging the main dictionary of many of these domain names?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thunderdrop/IBMTTSDictionaries/issues/13#issuecomment-700867279, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIGLZMVPO4XGMJTOPOY2XTLSIIK33ANCNFSM4RZZPD4Q.
Hmm, not me. I added some terms I often hear in Linux which are uncommon enough that they wouldn't conflict with dictionary words, No domains.
As for removing them, I'm not sure. Yes, adding them does create a bias, but that's only because there are so few on the project at present. If we had more people, we'd have a bigger sample. After all, our whole job here is tracking down things eloquence can't pronounce. Sorry I don't have any useful input, perhaps we need to chuck this around a bit more.
I just don't see it as necessary, as no synth will pronounce these, these are spellings that exist only because the DNS is case insensative after all. As an example, howtogeek. When you go to the website, the page title is How-To Geek. Or thefreedictionary for The Free Dictionary. And re chmod, do we know if linux people say ch mod, ch mode, or chmode?
@ultrasound1372 For the record, it is not that no other synth pronounces them properly. For instance, both MS SAPI 5 and MS OneCore voices pronounce "howtogeek" correctly. In fact, OneCore voices handle most of them quite gracefully. On 9/30/2020 2:57 AM, Colton Hill wrote:
I just don't see it as necessary, as no synth will pronounce these, these are spellings that exist only because the DNS is case insensative after all. As an example, howtogeek. When you go to the website, the page title is How-To Geek. Or thefreedictionary for The Free Dictionary. And re chmod, do we know if linux people say ch mod, ch mode, or chmode?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/thunderdrop/IBMTTSDictionaries/issues/13#issuecomment-701045255, or unsubscribe https://github.com/notifications/unsubscribe-auth/AIGLZMQI335V4O6QZR3MPH3SIJUPTANCNFSM4RZZPD4Q.
@ultrasound1372 I finally managed to remove all of these, so the issue is being closed.
I've noticed a trend developing in ENU Main giving special pronunciations for improperly cased versions of many websites and a few acronyms. As the pronunciation of the website domain name is generally not what you see as the title of the page, I vote for these pronunciations to be removed. As for acronyms, I'm not totally sure on that one, even for some lowercase versions. I believe the addition of pronunciations for domain names just makes the dictionary unnecessarily large and puts an undo burden on the contributors, as virtually every website in existence with a multi-word name would have to be added. This then produces a heavy bias on the part of the contributor since this goal is unattainable. An argument can be made about the bias of the populus rather than the contributor, for certain sites of things like news organizations, but I believe the domain names should be removed, instead focusing on actual words one will encounter in general text. If ECI has broken handling of all-caps acronyms, their existence is justified.
cc @amirsol81 @thunderdrop