prosodylab / prosodylab.dictionaries

A repository for dictionaries to be used with the Prosodylab-Aligner
17 stars 3 forks source link

Dictionary for German #5

Closed joanna-aflr closed 3 years ago

joanna-aflr commented 3 years ago

Hi,

I am wondering if it is possible to convert the pronunciations in de.dict into ARPAbet? Which phone set does de.dict use? Is it CELEX? And is en-CELEX.dict using the same phone set?

I'd appreciate if anybody could help me with those questions or point me to some documentation regarding those dictionaries.

Thank you!

kylebgorman commented 3 years ago

Hi Joanna, answers/comments inline.

I am wondering if it is possible to convert the pronunciations in de.dict into ARPAbet?

No. There is no such thing as German ARPAbet. ARPAbet was built by (D)ARPA, the US defense contracting incubator and historically it was designed solely for early American English ASR research.

Which phone set does de.dict use? Is it CELEX?

I actually don't know the name of the German phoneset, but the data is from CELEX. There is no "CELEX phoneset" to my knowledge. Each CELEX language uses a different phoneset.

And is en-CELEX.dict using the same phone set?

The English CELEX dictionary is transcribed in a phoneset called DISC which was introduced by Cecil Coker, I believe. The dialect transcribed is Received Pronunciation and there are not good ARPAbet equivalents to this (very conservative and very artificial) register, since ARPAbet is really only intended for a Mid-Atlantic American English accent and nothing more.

The only documentation of this set I can find is in the CELEX documentation itself. If you have a license for CELEX, you will also have a Postscript and/or PDF file of about 150 pages that gives IPA-like equivalents for all the phonemes in English, Dutch, and German, IIRC.

I did develop a table of approximate equivalents between ARPAbet and DISC years ago. This is original research and may be flawed: http://wellformedness.com/papers/codes/

joanna-aflr commented 3 years ago

Great, thank you very much for your clarifications!

joanna-aflr commented 3 years ago

I have one more question about the dictionaries in this repo (I'm mostly interested in the German one). Is the license for them the same as for Prosodylab-Aligner?

kylebgorman commented 3 years ago

No, good question. We don't provide the German dictionary with a license since it's not our data to license, it's just provided for testing. IANAL but since it's derived from CELEX, it likely has the same (proprietary) license as CELEX. If you need to use this and don't have it, I believe CELEX can be licensed for use for $500 USD.

kylebgorman commented 3 years ago

I should mention: as it happens, my lab has created an enormous database of free, Creative Commons-licensed pronunciation dictionaries in the intervening years: https://github.com/kylebgorman/wikipron