Fix #155: Prosemirror - Emoji Support Broken

luke- commented 7 months ago

@serh-mosk Thanks for the fix.

Can you please give some information about the fix, e.g. it would be interesting for me why we need emojilib-legacy.json. Do we need this for backward compatibility? Is there a reason why new smilies also need the legacy json?

Maybe it makes sense to migrate all smilies instead of using a legacy layer.

serh-mosk commented 7 months ago

@luke- Yes, this is necessary for backwards compatibility. Since the old emoticon names have been used for a long time, it will probably be difficult to find and rename them in all posts. In the new collection of emoticons, 1135 emoticons have a different name. So I added the old collection and adapted it to the new one.

luke- commented 6 months ago

@serh-mosk Unfortunately I am an expert on the emoji topic, so here some more questions:

Do I understand it correctly that with Twemoji e.g. from version jump 14 to 13, the names of some emojis have changed. Is that why both lists have to be included?
How does it actually work with the emoji names, do they come from Twemoji or are they somehow independent (standardized)?
If standardized, would it probably be good to do the conversion work in the database? So we can say, we're using the latest spec?
Doesn't it make more sense to use the Unicode key of the emojis instead of the name?
For the current issue, a quick fix would be important to me, but I would still like to have a stable, future-proof version for the emojis.

serh-mosk commented 6 months ago

@luke- Ok, this is my answers:

Not really, not only twemoji has been updated, but also emojilib, which is used to display a list of emoticons and insert them into the content. The second list of emoticons (emojilib-legacy.json), generated by me, is a mapping of the old emoticon names in emojilib with the new ones, while maintaining the new data structure.
In emojilib, the data structure of the module was changed, what in version 2.4.0 was the key of the emoticon object and was inserted into the content as :name:, in version 3.0.0 it moved into the object itself as a parameter of the name or slug (see changelog and migration steps).
Yes, for standardization, it is possible to replace in the content all the old emoticon names (these are 1135 names) with new ones from the comparison list I added (emojilib-legacy.json). In this case, this PR should not be merged.
Yes, this is how it works under the hood in the twemoji module. But this twemoji module requires an emoji library - emojilib.
My solution is fast and stable, you can test it locally e.g. Both old and new emoticon names are supported.

luke- commented 6 months ago

@serh-mosk Ok, thanks for the explanations. Then let's use your solution. At least for now.

Markdown content with emojis is not only output via Javascript, but also rendered on the server side, e.g. in emails / notifications.

PHP Renderer:

https://github.com/humhub/humhub/blob/master/protected/humhub/libs/EmojiMap.php
Which means we may also need to add new Emojis to the PHP Server Side Map.

But in general, what are the reasons for not saving the Unicode in the database, e.g. 🕶️ instead of :sunglasses:?

I see the following advantages:

We don't necessarily need an emoji renderer like EmojiLib or PHP (https://github.com/humhub/humhub/blob/master/protected/humhub/modules/content/widgets/richtext/extensions/emoji/RichTextEmojiExtension.php)
But we would probably still be able to replace such "cheap" black&white emojis with icons from Twemoji - but it's optional
It is not English text stored in the database, but language neutral
There are no migration problems like now with Emojilib 2 -> 3

Disadvantages:

If, for example, an emoji is used that cannot be displayed natively and no renderer is used, only 🌛 is displayed instead of :first_quarter_moon_face:, for example. But this only affects outputs without a JS renderer and few emojis.

What do you think? Should we refactor this in future?

serh-mosk commented 6 months ago

@luke- I think the reason is that when someone writes a post, he doesn't know which unicode corresponds to this or that emotion, but he can definitely write it down in a short code (the name of an emoticon). Also, it would not be clear to us when, if it was not possible to display a smiley, we would see its unicode instead of its short name. We also need to keep in mind the possibility of supporting skin color types for some emoticons. I think this was one of the main reasons for changing the data structure of emojilib. Yes, we will most likely have to refactor the emojis when we add support for skin colors.

luke- commented 6 months ago

@luke- I think the reason is that when someone writes a post, he doesn't know which unicode corresponds to this or that emotion, but he can definitely write it down in a short code (the name of an emoticon).

We can still use the Emoji Chooser, the Emojilib and its codes. Just insert the Unicode instead of the Text Code.

Also, it would not be clear to us when, if it was not possible to display a smiley, we would see its unicode instead of its short name. We also need to keep in mind the possibility of supporting skin color types for some emoticons. I think this was one of the main reasons for changing the data structure of emojilib. Yes, we will most likely have to refactor the emojis when we add support for skin colors.

Okay, that's right. Perhaps we can make a change in relation of this expansion. I'll create a new issue for this and merge the current one.

Thank you

humhub / humhub-prosemirror

Fix #155: Prosemirror - Emoji Support Broken #140