Support alternate translation alignment formats

scaife-viewer / sv-mini-atlas

ATLAS implementation for the Scaife "SV Mini" prototype

https://scaife-viewer.org/

MIT License

1 stars 1 forks source link

Support alternate translation alignment formats #24

Open jacobwegner opened 4 years ago

jacobwegner commented 4 years ago

(Copied from #21)

Data sources:

Discussion when we get to grooming:

Moving away from "left/right" to mapping queries to `passageReferences
How to preserve "orders" in AlignmentSet. E.g, for Digital Sira, we actually duplicated each set in full
Pipeline for Ugarit + TEI XML over to ATLAS

jacobwegner commented 4 years ago

It doesn't look like Ugarit offers an export API, but individual annotators can export to XML or CSV (see https://gist.github.com/jacobwegner/f7c7d29d7d320c3b430c883e4820f049).

The XML export is similar to https://github.com/perseids-project/alignment-prototypes/blob/master/hafez/data/align.3298.1.xml

It seems like it may be better to just work from the alignment-prototypes repository for now, and circle back to further Ugarit --> Ducat --> ATLAS --> Scaife Viewer workflows.

jacobwegner commented 4 years ago

Also worth noting that https://eumaeus.github.io/uva_cex_ducat/cite-1.15.0.html?urn=urn:cts:greekLit:tlg0031.tlg004.wh_fu:3.16 (linked from https://eumaeus.github.io/uva_cex_ducat/) uses a .tok exemplar for tokenized versions of a text, and that the tokenizer used separates punctuation (where traditionally we've "just" favored a white-space tokenizer).

jacobwegner commented 4 years ago

And as another follow up, the version used in alignment-prototypes doesn't match 1:1 with whitespace / punctuation to what we currently display on scaife.perseus.org; for the purposes of demonstrating compatibility, I think I'll start with a subset of that repo and load things in Ducat, as the "sentence-level" alignments to the Greek text will be quite interesting as well.

jacobwegner commented 4 years ago

I was hoping to use ThomasK81/TEItoCEX to extract CEX files, but I think we'd have to add cases to cover the Divan Hafez refsDecls / xpath selectors:

https://github.com/ThomasK81/TEItoCEX/blob/5bc22a3ffd7bf006cc19edbdbc6c3df2dfd8b69d/CTSExtract.go#L510

So I'm instead going to work with some custom scripts that might merge into scaife-viewer/cite-tools at some point.

jacobwegner commented 4 years ago

Nothing to share for the script, but have a resulting CEX file that can be opened by Ducat:

https://explorehomer-spike-duca-8mwdty.herokuapp.com/ducat/

https://gist.github.com/jacobwegner/b52a15a30fb15986e5e4b18509bedde2