Open jacobwegner opened 4 years ago
It doesn't look like Ugarit offers an export API, but individual annotators can export to XML or CSV (see https://gist.github.com/jacobwegner/f7c7d29d7d320c3b430c883e4820f049).
The XML export is similar to https://github.com/perseids-project/alignment-prototypes/blob/master/hafez/data/align.3298.1.xml
It seems like it may be better to just work from the alignment-prototypes
repository for now, and circle back to further Ugarit --> Ducat --> ATLAS --> Scaife Viewer workflows.
Also worth noting that https://eumaeus.github.io/uva_cex_ducat/cite-1.15.0.html?urn=urn:cts:greekLit:tlg0031.tlg004.wh_fu:3.16 (linked from https://eumaeus.github.io/uva_cex_ducat/) uses a .tok
exemplar for tokenized versions of a text, and that the tokenizer used separates punctuation (where traditionally we've "just" favored a white-space tokenizer).
And as another follow up, the version used in alignment-prototypes
doesn't match 1:1 with whitespace / punctuation to what we currently display on scaife.perseus.org; for the purposes of demonstrating compatibility, I think I'll start with a subset of that repo and load things in Ducat, as the "sentence-level" alignments to the Greek text will be quite interesting as well.
I was hoping to use ThomasK81/TEItoCEX to extract CEX files, but I think we'd have to add cases to cover the Divan Hafez refsDecls / xpath selectors:
So I'm instead going to work with some custom scripts that might merge into scaife-viewer/cite-tools at some point.
Nothing to share for the script, but have a resulting CEX file that can be opened by Ducat:
https://explorehomer-spike-duca-8mwdty.herokuapp.com/ducat/
https://gist.github.com/jacobwegner/b52a15a30fb15986e5e4b18509bedde2
(Copied from #21)
Similar to http://divan-hafez.com/?fn=32
Data sources:
Discussion when we get to grooming:
AlignmentSet
. E.g, for Digital Sira, we actually duplicated each set in full