jtauber / greek-accentuation

Python 3 library for accenting (and analyzing the accentuation of) Ancient Greek words
MIT License
56 stars 10 forks source link

Syllabify rules #6

Open abithyzis opened 6 years ago

abithyzis commented 6 years ago

I think you are missing some values in your is_diphthong and is_valid_consonant_cluster to give better results. Edit: I just realized this is for ancient Greek and my use case is modern Greek, so I am not sure my changes will apply. I also wrote a bad test to make sure I follow the consensus on syllabifying.

Let me know if you want a pull request, or I can just paste them here.

abithyzis commented 6 years ago

There are a few consonant clusters missing so I am copying the whole function

def is_valid_consonant_cluster(b, c):
    s = base(b).lower() + ("".join(base(b2) for b2 in c)).lower()
    return s.startswith((
        "βδ", "βλ", "βρ",
        "γλ", "γν", "γρ", "γκ",
        "δρ",
        "θλ", "θν", "θρ",
        "κλ", "κν", "κρ", "κτ",
        "μν", "μπ",
        "ντ",
        "πλ", "πν", "πρ", "πτ",
        "σβ", "σθ", "σκ", "σμ", "σπ", "στ", "σφ", "σχ", "στρ",
        "τρ", "τμ", "τσ", "τζ",
        "φθ", "φλ", "φρ", "φτ",
        "χθ", "χλ", "χρ", "χτ",
    ))

Regarding diphthongs, things are not very clear (they are called digraphs-δίψηφα now)

def is_diphthong(chs):
    return base(chs[0]).lower() + base(chs[1]).lower() in [
        "αι", "ει", "οι", "υι",
        "αυ", "ευ", "ου", "ηυ",
        "αη", "οη",
        "αου",
        "ια", "υα", "εια", "οια",
        "ιο",
    ] and not diaeresis(chs[1])

These are changes for modern Greek so I am not sure if they are relevant.