Handle Unicode escape sequences for emojis and symbols

mlomb / chat-analytics

Generate interactive, beautiful and insightful chat analysis reports

https://chatanalytics.app

GNU Affero General Public License v3.0

698 stars 51 forks source link

Handle Unicode escape sequences for emojis and symbols #115

Open ShortTimeNoSee opened 3 months ago

ShortTimeNoSee commented 3 months ago

Currently when a data export has emojis that are in the form of Unicode escape sequences (e.g., \u00f0\u009f\u0098\u00ad), it is not handled correctly on analysis and causes incorrect statistics for emoji data.

ShortTimeNoSee commented 3 months ago

There is also an issue with symbols

This returns as "donâ" when it's supposed to be "don't" (except their apostrophe is the curvy type, so "don’t"). So we get stuff like this as a result lol: