spencermountain / compromise

modest natural-language processing
http://compromise.cool
MIT License
11.4k stars 654 forks source link

It might be helpful to parse abbreviations of common words #128

Open jashsayani opened 8 years ago

jashsayani commented 8 years ago

While English is an amazing language and NLP has reached a point where it is very good at breaking up English sentences to understand context, humans have transformed English from the amazing language to a bunch of abbreviations tacked on together. So it would be more useful if the NLP library understood common abbreviations since thats what users would type.

Example: I cannot come to dinner because I am at a meeting. would be written as Can't come to dinner bec i m at meeting or Can't come to dinner coz at meeting.

spencermountain commented 8 years ago

hi Jash, I completely agree! ;) this is a great idea, and would fit nicely with the concept of text-matching by meaning/grammar - nlp.text('coz im at a meeting').match('~because~ i am') should probably be a good match, right? love it

jashsayani commented 8 years ago

nlp.text('coz im at a meeting').match('~because~ i am') should probably be a good match, right?

Yes, essentially understanding the most commonly used abbreviations for words.

flesler commented 6 years ago

Somewhat related to #505, I wouldn't call these abbreviation "synonyms" but the functionality to match them as if they were the same, maybe normalize to one or actual paraphrase seems similar.