scaife-viewer / beyond-translation-site

Site used to iterate on translation alignments within the Scaife Viewer ecosystem
3 stars 4 forks source link

Add token annotations for Odyssey #45

Open jacobwegner opened 2 years ago

jacobwegner commented 2 years ago

@jtauber as we'd discussed on a call, it'd great to get your help on importing token annotations for Odyssey from the treebank.

I'm still working through a bit of "branch hygiene", but targeting a format similar to what we've done for Iliad would be great!

https://github.com/scaife-viewer/beyond-translation-site/tree/cfe2e33bdf10bfce4b16b44120aadb993a6caa81/backend/data/annotations/token-annotations/iliad-crane-shamsian

Suggested structure:

annotations/
├─ token-annotations/
│  ├─ odyssey-treebank/
│  │  ├─ metadata.yml
│  │  ├─ tlg0012.tlg002.perseus-grc2.csv

Feedback on the structure, metadata, or our use of ve_ref is welcome; (if, for example, we want to move from ve_ref to the subreference-ish scheme used on scaife.perseus.org, etc).

jacobwegner commented 2 years ago

@jtauber I'm going to take a crack at this with something within the project that mirrors https://morph.perseus.org and can eventually be incorporated into https://github.com/jtauber/postag-convert.

I'll ping you on the data PR to see if that I'm doing makes sense.

jacobwegner commented 2 years ago

@jtauber I may want some help with this after all; maybe we can sync up on Friday?

jacobwegner commented 2 years ago

I might be able to cheat for now by extracting from the XML files and not the TSVs.

Otherwise I have to do some special case handling in the TSVs to do things like subrefs or virtual exemplars

jacobwegner commented 2 years ago

Subrefs are still the way to handle this longer term, I think, but the token model in ATLAS is still pretty "white space word" specific.

Will need some good assertion tests for when we convert away from the ve_ref approach

jacobwegner commented 2 years ago

The XML files are working better; may still have some slight cleaning but we're probably like 98% there at this point.

I am hoping to deploy the Odyssey annotations later tonight.

jacobwegner commented 2 years ago

@gregorycrane I've made a pass that brings (most of) the lemmas and postags over to the site for Odyssey, which should enable the use of the "traversal" widget:

image

image

https://beyond-translation-gagdt-dev.herokuapp.com/reader/urn:cts:greekLit:tlg0012.tlg002.perseus-grc2:1.1?mode=syntax-trees&entryUrn=urn%3Acite2%3AexploreHomer%3Aentries.atlas_v1%3A1.7939