zhiyzuo / python-scopus

PyScopus
http://zhiyzuo.github.io/python-scopus/
MIT License
23 stars 28 forks source link

UnicodeEncodeError in retrieve_author() with non-ascii names #3

Closed Michael-E-Rose closed 7 years ago

Michael-E-Rose commented 7 years ago

I was searching for "Augustín Carstens" with Scopus ID 6603722641:

>>> info = scopus.retrieve_author("6603722641")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "build/bdist.linux-x86_64/egg/pyscopus/pyscopus.py", line 101, in retrieve_author
UnicodeEncodeError: 'ascii' codec can't encode character u'\xed' in position 2793: ordinal not in range(128)

I'm on Ubuntu 14.04 using with Python 2.7. I however don't know which version of pyscopus I am on as __version__ is not defined.

zhiyzuo commented 7 years ago

Updated version fixed this issue.

Michael-E-Rose commented 7 years ago

This one does not work completely. While there is no error, the name is wrong because the non-ascii character is simply removed.

>>> info = scopus.retrieve_author("6603722641")
Name: Agustn G. Carstens
Affiliation: International Monetary Fund (n/a,Washington,DC,n/a,United States)
>>> info
{'current-affiliation': [{'address': 'n/a,Washington,DC,n/a,United States', 'id': '60021406', 'name': 'International Monetary Fund'}], 'citation-count': 30, 'last-name': 'Carstens', 'subject-areas': ['Development', 'Accounting', 'Finance', 'Economics and Econometrics', 'Geography, Planning and Development'], 'affiliation-history': [{'address': 'n/a,Washington,DC,n/a,United States', 'id': '60021406', 'name': 'International Monetary Fund'}, {'address': 'Central Bank of,n/a,n/a,n/a,Mexico', 'id': '101232294', 'name': 'IMF'}, {'address': 'Avenida 5 de Mayo 2, Colonia Centro,Delegacion Cuauhtemoc,DF,6059,Mexico', 'id': '60092432', 'name': 'Banco de Mexico'}], 'cited-by-count': 29, 'first-name': 'Agustn G.', 'journal-history': ['Journal of Asian Economics', 'American Economic Review', 'IMF Staff Papers', 'Pakistan Development Review', 'American Economic Review', 'Columbia Journal of World Business', 'Trimestre Economico', 'Finance and Development'], 'document-count': 7}

The person however is named Carstens, Agustín G., as in https://www.scopus.com/authid/detail.uri?authorId=6603722641

zhiyzuo commented 7 years ago

I understand what you want but for my tasks, I simply remove ascii for easier preprocessing of text mining. That's why I just ignore other encoding characters.

If you really need this, I can work on this later today or tomorrow.

Michael-E-Rose commented 7 years ago

Well it's not about me, I am just reporting problems/issues related to an open-source software project ;)

On 4 October 2016 at 17:07, Zhiya Zuo notifications@github.com wrote:

Reopened #3 https://github.com/zhiyzuo/python-scopus/issues/3.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/zhiyzuo/python-scopus/issues/3#event-811898554, or mute the thread https://github.com/notifications/unsubscribe-auth/AGxGdidnF_Q-bRzliKsWv5iHZ4JjpAgjks5qwmuVgaJpZM4KGieH .

Michael E. Rose / PhD student

University of Cape Town | African Institute of Financial Markets and Risk Management & School of Economics michael-e-ro.se/

zhiyzuo commented 7 years ago

I got it fixed not and the version now is upgraded to 0.7.2.post4.

Thank you for bringing this up and let me make this project better. Let me know if the modified version works for you now.

Michael-E-Rose commented 7 years ago

Looks perfect, thank you!