titipata / pubmed_parser

:clipboard: A Python Parser for PubMed Open-Access XML Subset and MEDLINE XML Dataset
http://titipata.github.io/pubmed_parser/
MIT License
579 stars 166 forks source link

Handling of non-ASCII / unicode characters #140

Open Michael-E-Rose opened 4 months ago

Michael-E-Rose commented 4 months ago

Sometimes pubmed_parser returns text as is, sometimes it uses unidecode().

First, I think it should be consistent throughout the package. Second, I'm not sure unidecode() is still needed or even desired - recipients should be allowed to work with the original text.

What do you think, @titipata ?