perseids-project / perseids-client-apps

A simple flask application to serve input forms.
0 stars 2 forks source link

strip header, extra attributes from published treebank files #19

Open balmas opened 8 years ago

balmas commented 8 years ago

From https://github.com/alpheios-project/arethusa/issues/748:

When I try to import into Arethusa a treebank file from the Perseus Latin repository (e. g. phi1221.phi007.perseus-lat1.tb.xml from https://github.com/PerseusDL/treebank_data/blob/master/v2.1/Latin/texts/phi1221.phi007.perseus-lat1.tb.xml) using the "Upload Base XML Treebank / from file" button, I get the message: ERROR!! CHANGES NOT SAVED! errorunexpected attribute "oldId". When I change the file, removing the header element (with all its children) and body (I put in the annotator element from one of my exported treebank annotations), the file is read and displayed OK. The users often want to review or change already annotated trees from the "gold standard". Arethusa is the logical choice of environment to do so. Perhaps this could be achieved with an XSL or XQuery stylesheet layer which would, on import, strip out from the base XML treebank file everything above the sentence element, and add a treebank element that is acceptable to Arethusa.