knighton / splitta

Automatically exported from code.google.com/p/splitta
0 stars 0 forks source link

Question marks break sentence detection? #12

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Using the following text as input, the sentences ending in a question mark are 
not detected as sentences.

input = "This is the tale of Mr. Morton. Who is Mr. Morton? He is the subject 
of our tale, and the predicate tells what Mr. Morton must do. Here's a short 
sentence. Mister Morton is who?\nHere's another short sentence."

The resulting split lines are the following:

This is the tale of Mr. Morton.
Who is Mr. Morton? He is the subject of our tale, and the predicate tells what 
Mr. Morton must do.
Here's a short sentence.
Mister Morton is who? Here's another short sentence.

I would expect the sentences to split after both of the question marks.

This problem occurs with Splitta versions 1.03 and svn r21, under Linux and OS 
X 10.8.4, with Python 2.7.2.

Any help with this problem would be enormously appreciated, as we are 
attempting to use Splitta as a crucial component in an NLP pipeline for a 
summer camp at JHU that is underway: 
http://hltcoe.jhu.edu/research/scale-workshops/

Thank you!

Original issue reported on code.google.com by orl...@gmail.com on 11 Jun 2013 at 2:16

GoogleCodeExporter commented 9 years ago
I forgot to mention, that this happens when using either of the bundled NB and 
SVM models.

Can anyone else confirm the same results?

Original comment by orl...@gmail.com on 11 Jun 2013 at 2:23