scaife-viewer / beyond-translation-site

Site used to iterate on translation alignments within the Scaife Viewer ecosystem
3 stars 4 forks source link

Document and improve support for UD treebanks #151

Open jacobwegner opened 1 year ago

jacobwegner commented 1 year ago

Most of the currently ingested treebanks are encoded from the Perseids Treebank template and conform to the AGDT v2 guidelines.

As part of helping @jchill-git bring additional Arabic data, I'd like to revisit some work from 2021 that dealt with UD / CoNLL-U Format.

In 2021, I was experimenting with a pipeline that would:

I think it'd be great to have tighter integration with ConLL-U / spaCy for loading treebanks. I hope I can spend some time on this before @jchill-git is at the point where he wants to load syntax trees into Beyond Translation.

jacobwegner commented 1 year ago

(TLDR: there is a lot of round tripping / converting that needs to happen to load UD treebanks; provided we can map the syntax trees back to tokens in the internal data model, it would be great if we could just read from the ConLL-U directly)

jacobwegner commented 1 year ago

(And https://beyond-translation.perseus.org/reader/urn:cts:greekLit:tlg0012.tlg001.parrish-eng1-trees:402?mode=syntax-trees isn't loading on top of that)

jacobwegner commented 6 months ago

https://github.com/gregorycrane/Daphne is another repo to ingest