nlplab / nersuite

http://nersuite.nlplab.org/
Other
26 stars 12 forks source link

nersuite tag adds extra newline to end of input #21

Closed spyysalo closed 11 years ago

spyysalo commented 11 years ago

Per title.

I think the issue is here: https://github.com/nlplab/nersuite/blob/master/src/nersuite/nersuite.cpp#L226

while (! is.eof() ) {
    // 2.1. Read a sentence (or comments)
    get_sent(is, one_sent, multidoc_separator, separator_read);
    [...]
}

where the loop includes an (indirect) call of Suite::output_result_conll, which outputs an empty line. Now, as get_sent consumes an empty line before returning, this would be otherwise OK, but is.eof() will only evaluate as true after an attempt to read past the last byte (not after reading the last byte but not more). Then, as get_sent doesn't differentiate between a single newline and immediate EOF, there's no way to suppress the "extra" newline before terminating the loop.

(Sorry, that's a bit confusing ... anyways, I recommend adding if (is.peek() == EOF) break; there, that should do it.