unescapeHTML doesn't work on the translation server

zotero / translation-server

A Node.js-based server to run Zotero translators

Other

121 stars 50 forks source link

unescapeHTML doesn't work on the translation server #88

Closed shadeMe closed 5 years ago

shadeMe commented 5 years ago

We are currently experiencing inconsistent behaviour with the ZU.unescapeHTML function. When called by the Zotero client (v5.0.60), it correctly decodes the string The Trouble with “Evolution of Religion” to The Trouble with “Evolution of Religion”. However, when the translator is invoked by the server, the function returns the encoded string without any modifications.

The translator in question can be found here and the URL for reproduction.

mrtcode commented 5 years ago

For me the title seems to be correctly decoded. You can check it on https://zbib.org/ (it uses the latest translation-server). Although I see unescaped HTML entities in the Abstract field, but they are the same in Zotero client too.

I tested with https://journals.equinoxpub.com/JCSR/article/viewArticle/35722 , because the URL you provided contains just a frame and is saved as a web page.

Is translation-serverup to date?

dstillman commented 5 years ago

@shadeMe is using a custom translator that calls unescapeHTML. In Node, unescapeHTML currently only cleans tags and won't convert most entities:

https://github.com/zotero/zotero/blob/ad27e0c5faed8eaa939ab6e40dfce95c1eefb2ad/chrome/content/zotero/xpcom/utilities.js#L548-L560

There's some ancient code commented out that uses JSDOM (from Simon's original Node translation-server attempt, I assume). I'd be surprised if it worked on current JSDOM, but we can probably get something working with the current version.

mrtcode commented 5 years ago

Right, I will try to make it work.