neomatrix369 / nlp_profiler

A simple NLP library allows profiling datasets with one or more text columns. When given a dataset and a column name containing text data, NLP Profiler will return either high-level insights or low-level/granular statistical information about the text in that column.
Other
241 stars 37 forks source link

Sentences (in general) are getting an incorrect sentence_count value #14

Closed neomatrix369 closed 3 years ago

neomatrix369 commented 3 years ago

During the presentation, it was observed that sentences with emojis could end up getting an incorrect sentence count. And this could be due to the punctuation(s) that build up emojis.

Note: there may be other edge-cases involving the (.) sign which is the primary indicator of the end of a sentence in English (and many such Latin and Germanic languages).

Related

After checking multiple examples it seems many sentences were getting incorrect sentence count.

ritikjain51 commented 3 years ago

Added an Emoji Decoder

neomatrix369 commented 3 years ago

Added an Emoji Decoder

If you look at the library it already has one, what does you Emoji Decode do? Is it part of the current PR you are working on?

neomatrix369 commented 3 years ago

The problem detected isn't connected to Emojis, it was an example instance, other types of sentences had this issue too.

neomatrix369 commented 3 years ago

Fixed by #17 hence closing