hlaueriksson / GEmojiSharp

:octocat: GitHub Emoji for C#, ASP.NET Core and Blazor, dotnet tool for the terminal and PowerToys Run plugin
https://hlaueriksson.github.io/GEmojiSharp/
MIT License
116 stars 8 forks source link

GemojsSharp Version 2.0.0 missing some symbols #18

Closed zydjohnHotmail closed 2 years ago

zydjohnHotmail commented 2 years ago

Hello: I am using the repo to replace all the Emoji I don't want. The current version I am using is: Version 2.0.0 There are something missing. I have the following: 1) Unicode Character “🗣” (U+1F5E3) 2) Unicode Character “❗” (U+2757) Please check! Thanks

hlaueriksson commented 2 years ago

These two emojis can be found via the aliases:

  1. :speaking_head:
  2. :exclamation:

I updated a test for this: c7ddef7

The source of the data is: https://raw.githubusercontent.com/github/gemoji/master/db/emoji.json

zydjohnHotmail commented 2 years ago

Hello: Thanks for your reply. I think your repo is rather complete. But how I can use it without missing anything. You know some telegram public channel use quite a number of Emoji, I want to extract the text from them, but I want to simply discard all those Emoji. My C# code is like this: string pure_text = Regex.Replace(raw_text, Emoji.RegexPattern, string.Empty); Why I found some Emoji is missing, becuase when I used the above code for about 200 public chats, I found 2 Emoji is missing. I may have to parse 20K of such public chats, there will be more Emoji missing using the code? I give you one of such telegram public channel: https://t.me/censor_net Thanks,

hlaueriksson commented 2 years ago

The emojis in this project are based on what's supported on GitHub. It's the https://github.com/github/gemoji repo that has the master data.

The latest release of gemoji, 4.0.0.rc2, is from Jan 31, 2020. So, it's not up to date with the Emoji v14.0.

I don't know what emojis Telegram supports.

zydjohnHotmail commented 2 years ago

Hello: You can download Telegram desktop for Windows 10 at this URL: https://telegram.org/dl/desktop/win64 And join this public channel: https://t.me/censor_net Then you can export some chats from this channl, set to use Json format, I think 3 to 5 days will be enough, then you will see there are many emoji in the exported Json file (result.json) You will see that Telegram use a lot of emoji.

hlaueriksson commented 2 years ago

If GEmojiSharp is not working in your use case, you can try the regex from the Stack Overflow answer here: https://stackoverflow.com/a/48148218