uhjish / entropy-calculator

Automatically exported from code.google.com/p/entropy-calculator
0 stars 0 forks source link

improve sentence spliiter #5

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

the current regex SplitSentences=re.compile(u'[.!?]') splits  expressions like 
'4.5 $US' to ['4', '$US']

split_sentences = re.compile(u'[.!?]\s+')

is more precise as it checks a whitespace following the end sentence mark

Original issue reported on code.google.com by christian.ledermann on 1 Nov 2013 at 8:21