haifengl / smile

Statistical Machine Intelligence & Learning Engine
https://haifengl.github.io
Other
6.02k stars 1.13k forks source link

Sentence doesn't break without punctuation #664

Closed Yuhanlolo closed 3 years ago

Yuhanlolo commented 3 years ago

Describe the bug I'm processing a text without punctuation (from speech input) using SimpleSentenceSplitter.getInstance().split(text), but it just returns the original text without breaking them into pieces.

Expected behavior "I went hiking today in the morning at 9 a.m. the weather was nice it was fun I feel happy and relaxed" --> "I went hiking today in the morning at 9 a.m." "the weather was nice" "it was fun I feel happy and relaxed"

Actual behavior "I went hiking today in the morning at 9 a.m. the weather was nice it was fun I feel happy and relaxed" --> "I went hiking today in the morning at 9 a.m. the weather was nice it was fun I feel happy and relaxed"

Code snippet The code to reproduce the behavior.

Input data See above

Additional context

haifengl commented 3 years ago

There is no guarantee of 100% accuracy in machine learning. Besides, the sentence is not grammatically correct. If it is "I went hiking today in the morning at 9 am. The weather was nice it was fun I feel happy and relaxed", it will be split as expected.

Yuhanlolo commented 3 years ago

I use speech input so there is no punctuation at all and the gramma is not always correct :) Plus the speech recognizer would add periods to am and pm (like a.m. and p.m.)

But thanks for the clarification! I'll try to find other solutions