jtauber / greek-accentuation

Python 3 library for accenting (and analyzing the accentuation of) Ancient Greek words
MIT License
56 stars 10 forks source link

display_accentuation(get_accentuation('ἣ')) -- eta rough breathing and grave -- throws and error #17

Open gregorycrane opened 4 years ago

gregorycrane commented 4 years ago

display_accentuation(get_accentuation('ἣ')) Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/greek_accentuation/accentuation.py", line 68, in display_accentuation return accentuation.name.lower() AttributeError: 'NoneType' object has no attribute 'name'

gregorycrane commented 4 years ago

In this case, the function is brittle because it assumes we have checked for an accent rather than returning "none" or the like in the following case.

display_accentuation(get_accentuation('δ’')) Traceback (most recent call last): File "", line 1, in File "/Library/Frameworks/Python.framework/Versions/3.8/lib/python3.8/site-packages/greek_accentuation/accentuation.py", line 68, in display_accentuation return accentuation.name.lower() AttributeError: 'NoneType' object has no attribute 'name'

jtauber commented 4 years ago

I think the underlying issue here is that get_accentuation expects a normalized accentuation. It's not intended to handle graves or words with an additional oxytone because of a following enclitic.

(greek-normalisation handles that normalization step)

gregorycrane commented 4 years ago

what crashed it was h(\,  the standard nom fem sg of the relative pronoun in

οὐλομένην,ἣμυρίʼἈχαιοῖςἄλγεʼἔθηκε,

or am I missing something?

On 6/12/20 12:37 PM, James Tauber wrote:

I think the underlying issue here is that |get_accentuation| expects a normalized accentuation. It's not intended to handle graves or words with an additional oxytone because of a following enclitic.

(|greek-normalisation| handles that normalization step)

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/jtauber/greek-accentuation/issues/17#issuecomment-643374394, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABHLVGJTBMQHV3EW7GPC6ZDRWJKUXANCNFSM4N32PIGA.

jtauber commented 4 years ago

It's only ἣ in running text, though. The standalone form is ἥ and the assumption the code is making (which might be debatable but it's the assumption I made for my own work) is if you're querying for the accentuation type (e.g. is it perispomenon or paroxytone or whatever) that that's a property of the isolated accented word, not the string in running text.

In my own corpus work, I always use greek-normalisation and generate an isolated form for tokens. I talk about it a bit in this blog post: https://jktauber.com/2018/07/23/normalisation-column-morphgnt/

jtauber commented 4 years ago

If you don't want the full-on greek-normalisation you can also just copy paste the code from https://github.com/jtauber/greek-normalisation/blob/master/greek_normalisation/utils.py which has things like grave_to_acute and strip_last_accent_if_two as well as its own strip_accents too.