jacobslusser / ScintillaNET

A Windows Forms control, wrapper, and bindings for the Scintilla text editor.
MIT License
964 stars 242 forks source link

Question Regarding Combined Characters and Regex #525

Open AlanBurkhart opened 2 years ago

AlanBurkhart commented 2 years ago

I have my own Regex find-replace dialog that's always worked pretty well. Except if a text document contains characters with more than one Unicode code point, it throws off the index of the match. One character position per combined character. In this case I wasn't searching for the offending character but rather specific text that came after. For example:

🕜 &#.128348; &#.x1F55C; Clock Face One-thirty

Searching for the ampersand matches the # sign. If I paste another clock face chr into the line, it'll match the "1". Is there a practical method for dealing with this? (dots inserted so entities displayed instead of characters)