ufal / treex

Treex NLP framework
33 stars 6 forks source link

treat any sentence that contains only whitespace as empty #56

Closed ptakopysk closed 8 years ago

ptakopysk commented 8 years ago

For the purposes of Read::Sentences skip_empty=1, as a sentence that contains only whitespace is still considered to be empty by the tokenizer and it dies there. This may change some behaviour somewhere, so I am submitting this as a pull request not to break anything.

martinpopel commented 8 years ago

Thanks. I agree it is reasonable to expect that after Read::Sentences skip_empty=1 all zones have non-empty $zone->sentence (that is with some tokens, not just whitespace).