when the user enters invalid unicode characters (such as U+DFFF), the function throws an exception with the following message:
For example
string contained an illegal UTF-16 sequence
Taking the programmatic approach to discover the answer, the only range that turned up any problems was \ud800-\udfff, the range for high and low surrogates
So, if you want to take the easy route and block surrogates, it is just a matter of:
If you want to strip out unmatched (invalid) surrogates while allowing surrogate pairs (which are legitimate sequences but the characters are rarely ever needed), you can do the following:
function stripUnmatchedSurrogates (str) {
return str.replace(/[\uD800-\uDBFF](?![\uDC00-\uDFFF])/g, '').split('').reverse().join('').replace(/[\uDC00-\uDFFF](?![\uD800-\uDBFF])/g, '').split('').reverse().join('');
}
when the user enters invalid unicode characters (such as U+DFFF), the function throws an exception with the following message:
For example
string contained an illegal UTF-16 sequence
Taking the programmatic approach to discover the answer, the only range that turned up any problems was
\ud800-\udfff
, the range for high and low surrogatesSo, if you want to take the easy route and block surrogates, it is just a matter of:
urlPart = urlPart.replace(/[\ud800-\udfff]/g, '');
If you want to strip out unmatched (invalid) surrogates while allowing surrogate pairs (which are legitimate sequences but the characters are rarely ever needed), you can do the following:
reference