joypixels / emojione

[Archived] The world's largest independent emoji font. Maintained at https://github.com/joypixels/emoji-toolkit.
https://www.joypixels.com
Other
4.46k stars 535 forks source link

HTML end tag seems to break ASCII conversion #54

Open divad opened 9 years ago

divad commented 9 years ago

When using the ASCII conversion (on say shortcodes to images) the ascii codes, e.g. <3 are converted to :heart:. However, this only works if a space precedes the ascii smiley like so:

<p>text <3</p>

It doesn't work if there is no space and/or it follows a HTML bracket:

<p><3</p>

This remains as the ascii rather than an image.

mjau-mjau commented 8 years ago

This bug still persists. ASCII conversion does not work for smilies immediately succeeding or preceding a html tag. Strange that it works for mapping emoji short names, but not for ascii. I made a dirty hack for it, since I had to map additional PHPBB smilies anyway. Here is the concept if anyone is interesting in the general idea:

// create a map of ascii smiley conversions
var emoji_map = {
    ':oops:':':flushed:',
    ':D':':smiley:',
    ';)':':wink:',
    ':)':':slight_smile:',
    ':(':':disappointed:',
    ':o':':open_mouth:',
    ':shock:':':astonished:',
    ':?:':':grey_question:',
    ':!:':':grey_exclamation:',
    ':?':':confused:',
    '8)':':sunglasses:',
    ':lol:':':laughing:',
    ':x':':no_mouth:',
    ':P':':stuck_out_tongue:',
    ':evil:':':imp:',
    ':twisted:':':smiling_imp:',
    ':roll:':':rolling_eyes:',
    ':idea:':':bulb:',
    ':arrow:':':arrow_right:',
    ':mrgreen:':':alien:',
    ':|':':expressionless:'
};

// function to replace ascii with shortnames from the emoji_map
function match_smilies(str){
    return str.replace(/:oops:|:D|;\)|:\)|:\(|:o|:shock:|:\?:|:!:|:\?|8\)|:lol:|:x|:P|:evil:|:twisted:|:roll:|:idea:|:arrow:|:mrgreen:|:\|/g, function(matched){
      return emoji_map[matched];
    });
}

// apply emoji and mapped ascii
function map_smilies(){
    // loop html elements where to apply emoji
    $(".postbody").each(function() {
    $(this).html(emojione.shortnameToImage(match_smilies($(this).html())));
    });
}

// page load
$(function(){
    // not really necessary, as I am mapping chars already, but this will map additional ascii smilies that were not part of my phpbb
    emojione.ascii = true;

    // apply emoji and mapped ascii after page load
    map_smilies();
});

PS! The above will only apply fix for the items in the emoji_map object. I needed a fix for this for an existing forum, because they were part of the existing 'smilies' interface. If you need to apply the fix for all items in the ascii-smileys list, you will have to populate the object 😑

caseyahenson commented 8 years ago

This issue requires a node-oriented solution to the replacement strategy, and we'll be working that into a future update.

@mjau-mjau Your solution is certainly effective, thank you for providing that! The root of the issue is in the limitations of the methodology used to identify ASCII strings. Unfortunately, element tags are difficult to distinguish from ASCII when doing regex so for that reason the regex was built to search only for strings that exist after the space character like so:

ns.regAscii = new RegExp("<object[^>]>.?<\/object>|<span[^>]>.?<\/span>|<(?:object|embed|svg|img|div|span|p|a)[^>]*>|((\s|^)"+ns.asciiRegexp+"(?=\s|$|[!,.?]))", "g");

That could be replaced with the following regex to consider all matches, not just those with a leading space:

ns.regAscii = new RegExp("<object[^>]>.?<\/object>|<span[^>]>.?<\/span>|<(?:object|embed|svg|img|div|span|p|a)[^>]*>|("+ns.asciiRegexp+"(?=\s|$|[!,.?<]))", "g");

You'll almost certainly run into the root issue here when doing this. Instead, our upcoming solution of looping through nodes on the dom will allow for the clear distinction between tags and text.

SirCumz commented 7 years ago

still not fixed :(

caseyahenson commented 7 years ago

A solution for this has been published in the 3.0.2 release. The object property riskyMatchAscii can be set to true, allowing all ASCII chars to be replaced regardless of whether they're space-char adjacent. While this isn't a perfect solution since it'll still break strings like 'c://', it should be a good starting point. Hopefully with feedback we can continue to improve this.