Keeping assume_input_is_tokenized to off does give a correctly formatted sentence item: <sentence sentid="127.0.0.1">hallo wereld .</sentence>.
I have to implement a work-around here anyway to support older Alpino-versions, so this isn't an issue for me. But I was wondering if there might be some setting I'm missing here to prevent this from happening? I couldn't figure out where in the Alpino-code this goes wrong.
When I modify the Makefile.start_server script
https://github.com/rug-compling/Alpino/blob/7a2ea6e2d8f7ae320a021b9e4ed1131a69d5a5a5/Makefile.start_server#L10
and change _assume_input_istokenized=off to _assume_input_istokenized=on the output becomes malformed.
For example:
Keeping
assume_input_is_tokenized
tooff
does give a correctly formatted sentence item:<sentence sentid="127.0.0.1">hallo wereld .</sentence>
.I have to implement a work-around here anyway to support older Alpino-versions, so this isn't an issue for me. But I was wondering if there might be some setting I'm missing here to prevent this from happening? I couldn't figure out where in the Alpino-code this goes wrong.