Ingest scholia annotations using new commentary widget

jacobwegner commented 2 years ago

[x] Backport CommentaryWidget to scaife-viewer/frontend
[x] Adapt / update ATLAS model so that at ingestion we can resolve from passage reference to lemmas
[x] Update beyond-translation-site to make use of new widget

jacobwegner commented 2 years ago

Another set of content:

https://scaife.perseus.org/library/urn:cts:greekLit:tlg5026/

Would also be good to figure out a pipeline for annotation curators (not just developers)

jacobwegner commented 2 years ago

@gregorycrane Small status update, and we can talk more on Tuesday.

I've gotten the first pass of this deployed to beyond-translation-dev.

Here are the commentary annotations shown on the perseus-grc2 edition:

https://beyond-translation-dev.perseus.org/reader/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2:1.1-1.20?entryUrn=urn%3Acite2%3AexploreHomer%3Aentries.atlas_v1%3A1.4645&mode=commentaries

and for reference, the msA edition (which also has the older "scholia" widget in place):

https://beyond-translation-dev.perseus.org/reader/urn:cts:greekLit:tlg0012.tlg001.msA:1.1-1.20?entryUrn=urn%3Acite2%3AexploreHomer%3Aentries.atlas_v1%3A1.156&mode=commentaries

--

A few things to note:

1) We're currently trying to match the fragment (called lemma in the hmt data, but not truly a lemma (c.f. TEI spec) with the word tokens.

The hmt annotations aren't 1:1 with the text of the msA:

# the annotation
urn:cts:greekLit:tlg5026.msA.hmt:1.4.lemma#θεά
urn:cts:greekLit:tlg5026.msA.hmt:1.4.comment#οὕτως εἴωθε τὴν Μοῦσαν καλεῖν· ἀμέλει καὶ ἐν Ὀδυσσεία · ⁑
...
urn:cts:greekLit:tlg5026.msA.hmt:1.4#urn:cite2:cite:verbs.v1:commentsOn#urn:cts:greekLit:tlg0012.tlg001.msA:1.1

# the text
1.1#Μῆνιν ἄειδε θεὰ Πηληϊάδεω Ἀχιλῆος

So if we want to do anything to correct the fragments (to match our versions) or create some other form of standoff annotation for the scholia (around an actual lemma or using a token offset, e.g. urn:cts:greekLit:tlg5026.msA.hmt:1.4 applies to the 2nd token in 1.1, etc), but I don't think I can do a whole lot more with the current data set as is.

2) Speaking of current data sets... Neel and Chris have a newer release of the HMT data that might have some corrections / differences. I can try and circle back and update in the next week or two.

3) There are a couple of remaining functionalities I'd like to do with the commentary widget:

a. Highlight the range, not just the first word. This is challenging due to "1" above, but I think is still an improvement
b. Provide attribution information (as we do on the existing Scholia) widget
c. Similar to dictionary entries, support multiple commentaries
d. Document ingestion format
e. Create pipeline to show "hit" rate
f. Multi-lingual support within commentaries

I would also have a few things to tighten up on the code before things are ready for production, but figured getting early feedback would be useful.

jacobwegner commented 2 years ago

@gregorycrane I spent an hour experimenting with using fuzzy string matching against the HMT data set. More things to tune, but resulted in a pretty good "coverage" improvement.

"Exact" matching:

"Fuzzy" matching:

https://beyond-translation-dev.perseus.org/reader/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2:1.1-1.20?mode=commentaries&entryUrn=urn%3Acite2%3AexploreHomer%3Aentries.atlas_v1%3A1.156

By "pretty good", I mean that we can link the commentary fragment to at least one token 100% of the time; there still some partial match or boundary issues to resolve though.

scaife-viewer / beyond-translation-site

Ingest scholia annotations using new commentary widget #52