scaife-viewer / beyond-translation-site

Site used to iterate on translation alignments within the Scaife Viewer ecosystem
3 stars 4 forks source link

Ingest scholia annotations using new commentary widget #52

Open jacobwegner opened 2 years ago

jacobwegner commented 2 years ago
jacobwegner commented 2 years ago

Another set of content:

Would also be good to figure out a pipeline for annotation curators (not just developers)

jacobwegner commented 2 years ago

@gregorycrane Small status update, and we can talk more on Tuesday.

I've gotten the first pass of this deployed to beyond-translation-dev.

Here are the commentary annotations shown on the perseus-grc2 edition:

https://beyond-translation-dev.perseus.org/reader/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2:1.1-1.20?entryUrn=urn%3Acite2%3AexploreHomer%3Aentries.atlas_v1%3A1.4645&mode=commentaries

image

and for reference, the msA edition (which also has the older "scholia" widget in place):

https://beyond-translation-dev.perseus.org/reader/urn:cts:greekLit:tlg0012.tlg001.msA:1.1-1.20?entryUrn=urn%3Acite2%3AexploreHomer%3Aentries.atlas_v1%3A1.156&mode=commentaries

image image

--

A few things to note:

1) We're currently trying to match the fragment (called lemma in the hmt data, but not truly a lemma (c.f. TEI spec) with the word tokens.

The hmt annotations aren't 1:1 with the text of the msA:

# the annotation
urn:cts:greekLit:tlg5026.msA.hmt:1.4.lemma#θεά
urn:cts:greekLit:tlg5026.msA.hmt:1.4.comment#οὕτως εἴωθε τὴν Μοῦσαν καλεῖν· ἀμέλει καὶ ἐν Ὀδυσσεία · ⁑
...
urn:cts:greekLit:tlg5026.msA.hmt:1.4#urn:cite2:cite:verbs.v1:commentsOn#urn:cts:greekLit:tlg0012.tlg001.msA:1.1
# the text
1.1#Μῆνιν ἄειδε θεὰ Πηληϊάδεω Ἀχιλῆος

So if we want to do anything to correct the fragments (to match our versions) or create some other form of standoff annotation for the scholia (around an actual lemma or using a token offset, e.g. urn:cts:greekLit:tlg5026.msA.hmt:1.4 applies to the 2nd token in 1.1, etc), but I don't think I can do a whole lot more with the current data set as is.

2) Speaking of current data sets... Neel and Chris have a newer release of the HMT data that might have some corrections / differences. I can try and circle back and update in the next week or two.

3) There are a couple of remaining functionalities I'd like to do with the commentary widget:

I would also have a few things to tighten up on the code before things are ready for production, but figured getting early feedback would be useful.

jacobwegner commented 2 years ago

@gregorycrane I spent an hour experimenting with using fuzzy string matching against the HMT data set. More things to tune, but resulted in a pretty good "coverage" improvement.

"Exact" matching:

"Fuzzy" matching:

https://beyond-translation-dev.perseus.org/reader/urn:cts:greekLit:tlg0012.tlg001.perseus-grc2:1.1-1.20?mode=commentaries&entryUrn=urn%3Acite2%3AexploreHomer%3Aentries.atlas_v1%3A1.156

image

By "pretty good", I mean that we can link the commentary fragment to at least one token 100% of the time; there still some partial match or boundary issues to resolve though.