kuhumcst / cstlemma

Lemmatiser for Danish, Dutch, English, German, Polish, Romanian, Russian and tens of other languages, that uses affix rules (affix: prefix, infix, suffix, circumfix). Rules are obtained by supervised learning from a full form - lemma list.
GNU General Public License v2.0
35 stars 7 forks source link

Failed to match stdin with input format #5

Closed jpiitula closed 6 years ago

jpiitula commented 6 years ago

I compiled cstlemma (all fresh clones from github) with STREAM set to 1 but it fails to match my input format (word TAB tag NL) when reading from standard input. Please advise. A sort of compilation log is attached.

Failing to read from stdin:

$ ./cstlemma -I'$w\t$t\n' -f empty < foobar 2> /dev/null
ERROR: When reaching the end of the input file, 2 parts of the input format specification string are left unmatched.

(It fails with success status.)

Succesfully reading the same input as a file with the same format:

$ ./cstlemma -I'$w\t$t\n' -f empty -i foobar 2> /dev/null
foo foo

Without a format, even standard input is read succesfully, but that is not what I want:

$ ./cstlemma -f empty < foobar 2> /dev/null
foo foo
bar bar

report.log

BartJongejan commented 6 years ago

Thank you for reporting this issue. It is now fixed. Instead if a terminating \n you can specify \s. Both will work. \s not only matches the new line but also any immediately preceding stray white space characters.