scaife-viewer / beyond-translation-site

Site used to iterate on translation alignments within the Scaife Viewer ecosystem
3 stars 4 forks source link

Improve implementation of disambiguated lemmas #142

Open jacobwegner opened 1 year ago

jacobwegner commented 1 year ago

See our LSJ entries for ἄωρος in urn:cts:greekLit:tlg0012.tlg002.perseus-grc2:12.89:

image

https://beyond-translation.perseus.org/reader/urn:cts:greekLit:tlg0012.tlg002.perseus-grc2:12.89?mode=dictionary-entries&entryUrn=urn%3Acite2%3Ascafife-viewer%3Adictionary-entries.atlas_v1%3Alsj-18938

jacobwegner commented 1 year ago

@jtauber:

We had introduced a "normalized" version of the entry headword:

image

If I use the "display" value instead of the normalized value, things get cluttered:

image

I can make use of the "display" version when choosing a "sibling":

image

Any thoughts?

I'll get a deploy done soon so you can play around with this some more...

jacobwegner commented 1 year ago

(Deployed to https://beyond-transl-pr-143.herokuapp.com/reader/urn:cts:greekLit:tlg0012.tlg002.perseus-grc2:12.89?mode=dictionary-entries&entryUrn=urn%3Acite2%3Ascafife-viewer%3Adictionary-entries.atlas_v1%3Acambridge-greek-lexicon-2307 )

jacobwegner commented 1 year ago

@jtauber to investigate δελφίς --> https://beyond-transl-pr-143.herokuapp.com/reader/urn:cts:greekLit:tlg0012.tlg002.perseus-grc2:12.96?mode=dictionary-entries&entryUrn=urn%3Acite2%3AexploreHomer%3Aentries.atlas_v1%3A1.2121

jacobwegner commented 1 year ago

(To review character stripping)

jacobwegner commented 1 year ago

@jtauber: Here is a better explanation of what is going on with δελφῖνάς in Odyssey 12. If you click through to load this query:

https://tinyurl.com/gh-bt-142-sample

You can see that headwordNormalizedStripped for LSJ, Cunliffe and Cambridge is stored as δελφις.

headword is provided directly from each lexicon.

headwordNormalized is computed in normalized_no_digits:

headwordNormalizedStripped is computed in normalize_and_strip_marks:

Beyond Translation is currently using headwordNormalized for the lookups; I believe this was done to avoid the exact kind of error where we might resolve both θεά and θέα within LSJ.

We're performing the exact same normalization from headwordNormalized on the search term provided by a user on the frontend.

So, back to δελφῖνάς in Od. 12:

Does that make sense to you? I have some additional things I'd like to document around this, but I think having this new headwordDisplay option will be a big help going forward.

jacobwegner commented 1 year ago

(We should review this for Cambridge and Lexicon Thucydideum, as well as replicating what the "word study tool" does for lookups https://www.perseus.tufts.edu/hopper/morph?l=%CF%84%CE%B1%CF%81%CE%AC%CF%83%CF%83%CF%89&la=greek)