Invalid unicode characters removed from datagrid

jupyterlab / lumino

Lumino is a library for building interactive web applications

https://lumino.readthedocs.io/

Other

635 stars 126 forks source link

Invalid unicode characters removed from datagrid #578

Closed nicojapas closed 1 year ago

nicojapas commented 1 year ago

Fixes #456

When dealing with astral symbols and ellipsing, datagrid generates invalid Unicode characters because of the use of substring().

before

With the regular expression /[\u{D800}-\u{DFFF}]/gu we match any character falling within the range of surrogate code points. This includes both high surrogates (0xD800 to 0xDBFF) and low surrogates (0xDC00 to 0xDFFF). So any invalid Unicode character resulting from splitting a surrogate pair is removed with replace().

after

welcome[bot] commented 1 year ago

Thanks for submitting your first pull request! You are awesome! :hugs:
If you haven't done so already, check out Jupyter's Code of Conduct. Also, please make sure you followed the pull request template, as this will help us review your contribution more quickly. welcome You can meet the other Jovyans by joining our Discourse forum. There is also a intro thread there where you can stop by and say Hi! :wave:
Welcome to the Jupyter community! :tada:

krassowski commented 1 year ago

Thank you for opening the PR! On conceptual level, what if someone has a table with all Unicode code points? Or if those are mapped to something else in a font. Would it be better to rewrite eliding to use Unicode-aware slice by first converting the string to an array as in https://stackoverflow.com/questions/62341685/javascript-unicode-aware-string-slice/62341816#62341816 ?

nicojapas commented 1 year ago

Thank you for opening the PR! On conceptual level, what if someone has a table with all Unicode code points? Or if those are mapped to something else in a font. Would it be better to rewrite eliding to use Unicode-aware slice by first converting the string to an array as in https://stackoverflow.com/questions/62341685/javascript-unicode-aware-string-slice/62341816#62341816 ?

Hi! Yes, that is a better approach I think. I just commited a new solution.

welcome[bot] commented 1 year ago

Congrats on your first merged pull request in this project! :tada: Thank you for contributing, we are very proud of you! :heart: