noklesta / The-Oslo-Bergen-Tagger

Morphosyntactic tagger for Norwegian bokmål and nynorsk
http://www.tekstlab.uio.no/obt-ny/
Other
30 stars 9 forks source link

Separate choice of input and output format #8

Closed wanthalf closed 2 years ago

wanthalf commented 3 years ago

We would need to separate the choice of input and output format. At the moment, the -wxml option changes the output format so that it contains the original string enclosed in the <word> element, but it also expects the input to be some kind of XML - which means that any XML-like text contents are just ignored and not analysed. Without the -wxml option, the input is treated as plain text (that is what we want), but the output does not contain the original string anymore (just some lower-cased version). Could we somehow get both?

wanthalf commented 2 years ago

Sorry, moving to the mtag repository: https://github.com/textlab/mtag/issues/6