This project provides an im- and an exporter to support the TreeTagger format in the linguistic converter framework Pepper (see http://corpus-tools.org/pepper/). The TreeTagger is a natural language processing tool, to annotate text with part-of-speech and lemma annotations. A detailed description of the importer can be found in section TreeTaggerImporter and a description for the exporter can be found TreeTaggerExporter.
Other
0
stars
1
forks
source link
Metadata containing XML escapes is not unescaped #20
Even though technically angle brackets don't need to be escaped inside attribute values. Making the values like this imports fine, but stays escaped in relANNIS output:
<meta URL="<a href='X'>bla</a>">
I think this is fine to encode like this (with escapes) in TT files, but the correct behavior is for the Salt model to then contain the unescaped values (with real '<' etc.). This would result in correct ANNIS output, and other modules would be responsible for escaping their metadata writer properly.
It's not possible to have metadata values like:
Even though technically angle brackets don't need to be escaped inside attribute values. Making the values like this imports fine, but stays escaped in relANNIS output:
I think this is fine to encode like this (with escapes) in TT files, but the correct behavior is for the Salt model to then contain the unescaped values (with real '<' etc.). This would result in correct ANNIS output, and other modules would be responsible for escaping their metadata writer properly.