jingxinxin / tiankeng

记录程序猿开发过程中已经遇到的各种坑,再记录解决方法 / Record the various pits that have been encountered during the development process, and then record the solution.
2 stars 0 forks source link

encodeURIComponent throws an exception #15

Open jingxinxin opened 5 years ago

jingxinxin commented 5 years ago

when the user enters invalid unicode characters (such as U+DFFF), the function throws an exception with the following message:

For example

string contained an illegal UTF-16 sequence

Taking the programmatic approach to discover the answer, the only range that turned up any problems was \ud800-\udfff, the range for high and low surrogates

So, if you want to take the easy route and block surrogates, it is just a matter of:

urlPart = urlPart.replace(/[\ud800-\udfff]/g, '');

If you want to strip out unmatched (invalid) surrogates while allowing surrogate pairs (which are legitimate sequences but the characters are rarely ever needed), you can do the following:

function stripUnmatchedSurrogates (str) {
    return str.replace(/[\uD800-\uDBFF](?![\uDC00-\uDFFF])/g, '').split('').reverse().join('').replace(/[\uDC00-\uDFFF](?![\uD800-\uDBFF])/g, '').split('').reverse().join('');
}

reference