[Bug] Reader.getSelectedText returns altered version of actual text

ImperialSquid commented 3 months ago

Probably not an issue for you to fix but just to let you know the current version of Reader.getSelectedText() returns a version of the text that messes with the unicode value.

First reported on my plugin here

When selecting text with diacritics (eg "Å"), Reader.getSelectedText() incorrectly returns a two unicode character string (U+0041 (LATIN CAPITAL LETTER A) and U+030A (COMBINING RING ABOVE))

Whereas getting the selected text with something like

Zotero.Reader.getByTabID(Zotero_Tabs.selectedID)
  ._iframeWindow
  .getSelection()
  .getRangeAt(0)
  .toString()

correctly returns a one character string (U+00C5 (LATIN CAPITAL LETTER A WITH RING ABOVE))

Also passed on to Zotero here

Since this is seemingly caused by Zotero itself and your toolkit doesn't do any form of re-encoding/etc, it's probably best fixed by them. But just in case it doesn't get fixed for some reason, I thought I'd let you know.

windingwind commented 3 months ago

Thanks for letting me know. The problem you mentioned is an issue with Zotero and is not related to this package. I'm afraid there is nothing I can do.

ImperialSquid commented 3 months ago

Turns out, I jumped the gun and didn't do my research, whoops...

Unicode has different normalised forms for encoding (accessed through String.prototype.normalize()) and Zotero consistently normalises to NFC (decomposed into parts, also the default), whereas what I needed was NFD (decomposed then recomposed systematically, not the default)

So if anything, the two character version, while not true to the original text, is consistent within Zotero, and my _iframeWindow.getSelection() version is a hacky work around that skips Zotero's normalisation...

Sorry for the false alarm lol I'll be sure to be more careful in the future 😅

windingwind / zotero-plugin-toolkit

[Bug] Reader.getSelectedText returns altered version of actual text #58