miniJs / miniCount

Character / Word / Sentence Count jQuery Plugin written in CoffeeScript
http://minijs.com/plugins/5/count
23 stars 8 forks source link

Too basic sentence segmentation #1

Open efi opened 12 years ago

efi commented 12 years ago

The sentence counter is not really usable in real-world scenarios and should at least support the inclusion of a common abbreviation list for the current language (or multiple languages?).

See http://en.wikipedia.org/wiki/Text_segmentation#Sentence_segmentation for an example.

While people write whole dissertations about this topic, your plugin should of course not go this deep but at least provide some basic options to prevent "false positives" for sentence boundaries.

Thanks!

matthieua commented 12 years ago

That's an excellent. I'm definitely considering adding a list of the most common exceptions. However, I'm not planning to support every language since I want to keep the plugin as simple as possible. The plugin would also give you the option to add your own list of exception when initialising the plugin. Would that make sense to you?

That feature is planning to be added in the version 1.1.

efi commented 12 years ago

Hi. I think that would totally suffice and keep the plugin concise while allowing for very sophistcated (and maybe domain-specific) user-defined abbreviation lists.

matthieua commented 12 years ago

Thanks for your feedback and feel free to contribute :)