Closed kgryte closed 1 year ago
Try something like this:
function reverse(str) {
var result = "";
var smx = str.length;
var sdx = 0;
/*
Can't start at the end because each segment depends on the
previous one. So build up a list of lengths first instead.
*/
var counts = [];
for (;;) {
var skip = str.codePointAt(sdx) <= 0xffff ? 1 : 2;
counts.push(skip);
sdx += skip;
if (sdx >= smx) break;
}
var cmx = counts.length;
while (cmx--) {
var skip = counts[cmx];
/*
Now we know how much to hop backward for the next grapheme.
Write the first segment, then the second (if present).
*/
var hop = skip;
while (hop) result += str[smx - hop--];
/*
Move on to the next glyph. (Bounds-checking won't be needed
here since the counts array holds an accurate list of widths.)
*/
smx -= skip;
}
return result;
}
@gardhr Thanks for the suggestion! Feel free to submit a PR along with tests. :)
Not sure how well your proposal accounts for all graphemes, as it has been a while since I last looked into it.
Sorry, I just don't have the time at the moment. It has worked well for me over the years, I would be rather surprised if it choked on anything at all. If it does however, just give me a set of failing samples and I'll be glad to fix it.
Wish I could do more to help. Maybe once I get passed some of this backlog?
var print = console.log;
var text =
"外国語の学習と教授Language Learning and TeachingИзучение и обучение иностранных языков語文教學・语文教学Enseñanza y estudio de idiomasИзучаване и Преподаване на Чужди Езициქართული ენის შესწავლა და სწავლება'læŋɡwidʒ 'lɘr:niŋ ænd 'ti:tʃiŋLus kawm thaib qhiaNgôn Ngữ, Sự học,말배우기와 가르치기Nauka języków obcychΓλωσσική Εκμὰθηση και Διδασκαλίαเรียนและสอนภาษา語文教學 😃, ქართული 😭, Εκμὰθηση 😈 иностранных𠀋𠀾𠁎𠁨☃★♲\u{2F804}";
var reversed = reverse(text);
print(text);
print();
print(reversed);
print();
print("Test:", reverse(reversed) == text ? "Pass" : "FAIL!");
Output:
外国語の学習と教授Language Learning and TeachingИзучение и обучение иностранных языков語文教學・语文教学Enseñanza y estudio de idiomasИзучаване и Преподаване на Чужди Езициქართული ენის შესწავლა და სწავლება'læŋɡwidʒ 'lɘr:niŋ ænd 'ti:tʃiŋLus kawm thaib qhiaNgôn Ngữ, Sự học,말배우기와 가르치기Nauka języków obcychΓλωσσική Εκμὰθηση και Διδασκαλίαเรียนและสอนภาษา語文教學 😃, ქართული 😭, Εκμὰθηση 😈 иностранных𠀋𠀾𠁎𠁨☃★♲你
你♲★☃𠁨𠁎𠀾𠀋хыннартсони 😈 ησηθὰμκΕ ,😭 ილუთრაქ ,😃 學教文語าษาภนอสะลแนยีรเαίλακσαδιΔ ιακ ησηθὰμκΕ ήκισσωλΓhcycbo wókyzęj akuaN기치르가 와기우배말,cọh ựS ,ữgN nôgNaihq biaht mwak suLŋiʃt:it' dnæ ŋin:rɘl' ʒdiwɡŋæl'აბელვაწს ად ალვაწსეშ სინე ილუთრაქицизЕ иджуЧ ан енавадоперП и енавачузИsamoidi ed oidutse y aznañesnE学教文语・學教文語вокызя хыннартсони еинечубо и еинечузИgnihcaeT dna gninraeL egaugnaL授教と習学の語国外
Test: Pass
@gardhr Thanks for the demo!
Yep, you're welcome.
This was resolved in https://github.com/stdlib-js/stdlib/pull/1082.
Checklist
Description
Encountered an error when attempting to use
@stdlib/string/reverse
to reverse a string which includes extended grapheme clusters. While the implementation does account for Unicode surrogate pairs and combining marks, extended grapheme clusters do not appear supported.Addressing this issue may require using a Unicode segmentation algorithm, as discussed in the Unicode standard. One can find reference implementations in Rust and elsewhere.
Related Issues
Related issues #295.
Questions
No.
Other
N/A
Demo
N/A
Reproduction
In order to reproduce this bug, do the following:
Expected Results
The reversed string to account for extended grapheme clusters.
Actual Results
N/A
Environments
The following environments are affected: