Fix html entities - Githubissues

pybliometrics-dev / pybliometrics

Python-based API-Wrapper to access Scopus

https://pybliometrics.readthedocs.io/en/stable/

Other

407 stars 127 forks source link

Fix html entities #263

Closed raffaem closed 2 years ago

raffaem commented 2 years ago

Scopus may return HTML entities for non-ASCII characters. This converts them to proper Unicode characters.

raffaem commented 2 years ago

Fix

> from pybliometrics.scopus import AuthorRetrieval
> auth = AuthorRetrieval("6506328678")
> print(auth.given_name)
J&#246;rgen H.M.

Michael-E-Rose commented 2 years ago

Looks good, although I didn't know this was a problem. Since you use nothing but unescape from the module, would you please only from html import unescape? This will save many characters in the code and keep the lines short. And since it's part of any pythom installation, please write it in the upper import block

raffaem commented 2 years ago

Arg ... doesn't work if the argument is None. Wait before merging

raffaem commented 2 years ago

Should be fine now :)

Michael-E-Rose commented 2 years ago

Three remarks about newlines in the review. Would you mind addressing them as well?

Michael-E-Rose commented 2 years ago

Relatedly, did the problem ever occur with affiliation names?

raffaem commented 2 years ago

Relatedly, did the problem ever occur with affiliation names?

not for me