Separate choice of input and output format

We would need to separate the choice of input and output format. At the moment, the -wxml option changes the output format so that it contains the original string enclosed in the <word> element, but it also expects the input to be some kind of XML - which means that any XML-like text contents are just ignored and not analysed. Without the -wxml option, the input is treated as plain text (that is what we want), but the output does not contain the original string anymore (just some lower-cased version). Could we somehow get both?

noklesta / The-Oslo-Bergen-Tagger

Separate choice of input and output format #8