stefantaubert / pinyin-to-ipa

Command-line interface and Python library to transcribe pinyin to IPA. The tones are attached to the vowel of the syllable.
https://hf.co/spaces/stefantaubert/pinyin-to-ipa
MIT License
26 stars 4 forks source link

Add support for erhua #2

Open lars76 opened 4 weeks ago

lars76 commented 4 weeks ago

Hey, could you add support for erhua. Combinations such as 事儿 = shìr​ are not handled. Even in standard Chinese (news etc.), erhua is often heard.

lars76 commented 4 weeks ago

Here is quick workaround based on https://en.wikipedia.org/wiki/Erhua#Standard_rules

def pinyin_to_ipa_erhua(pinyin):
    ipas = list(pinyin_to_ipa(pinyin[:-1]))
    suffix_to_ipa = {
        "anr": "ɐʵ",
        "enr": "ɚ", "inr": "ɚ", "unr": "ɚ",
        "angr": "ɑ̃ʵ",
        "engr": "ɤ̃ʵ", "ingr": "ɤ̃ʵ",
        "iongr": "ʊ̃ʵ", "ongr": "ʊ̃ʵ",
        "our": "ou̯˞",
        "iur": "ou̯ʵ",
        "aor": "ou̯˞",
        "iaor": "ɑu̯ʵ",
        "eir": "ɚ", "uir": "ɚ",
        "air": "ɐʵ",
        "ier": "ɛʵ",
        "uer": "œʵ",
        "er": "ɤʵ",
        "or": "ɔʵ",
        "ar": "ɐʵ",
        "ir": "ɚ",
        "ur": "u˞",
        "vr": "ɚ"
    }
    strip_two = ["anr", "enr", "inr", "unr", "angr", "engr", "ingr", "iongr", "ongr"]

    new_ipas = []
    for ipa in ipas:
        ipa = list(ipa)
        for k, v in suffix_to_ipa.items():
            if pinyin.endswith(k):
                if k in strip_two:
                    ipa = ipa[:-2]
                else:
                    ipa = ipa[:-1]
                if pinyin == "jur" or pinyin == "yur":
                    ipa += ["ɥɚ"]
                else:
                    ipa += [v]
                break
        new_ipas.append(ipa)

    return new_ipas
stefantaubert commented 3 weeks ago

Hello, thank you for the suggestion and the workaround. Unfortunately, I do not have the capacity to integrate this functionality at the moment.