`Encoding::JS.unescape` cannot parse Unicode surrogate pairs

Currently Encoding::JS.unescape will raise an EncodingError when it tries to parse Unicode surrogate character pair which often occur in JavaScript strings containing emoji characters. The StringScanner algorithm should be adjusted to identify when the first escaped unicode codepoint starts with \uD0.. , \uD8.., \uD9.., \uDA.., \uDB.., and the second escaped unicode codepoint starts with \uDC.., \uDD.., \uDE.., \uDF...

Example

"\uD83D\uDE80"

aka '🚀'

Example Solution

'"hello world! \\ud83d\\ude01"'
  .gsub(/
    \\u(d[890ab]\h\h)
    \\u(d[cdef]\h\h)
  /ix) {
    hi, lo = $1, $2
    (0x1_0000 +
      (Integer(hi, 16) - 0xd800) * 0x400 +
      (Integer(lo, 16) - 0xdc00))
    .chr("UTF-8")
  }
# => "\"hello world! \""

https://ruby.social/@nick_evans/112776837324476279

ronin-rb / ronin-support

`Encoding::JS.unescape` cannot parse Unicode surrogate pairs #519

Example

Example Solution