pybliometrics-dev / pybliometrics

Python-based API-Wrapper to access Scopus
https://pybliometrics.readthedocs.io/en/stable/
Other
422 stars 129 forks source link

UnicodeEncodeError on examples #1

Closed Michael-E-Rose closed 8 years ago

Michael-E-Rose commented 8 years ago

I was trying to replicate the examples given in the README.org using python 2.7.

from scopus.scopus_api import ScopusAbstract

ab = ScopusAbstract("2-s2.0-84930616647", refresh=True)

results in


Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_api.py", line 133, in __init__
    results = ET.fromstring(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1301, in XML
    return parser.close()
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1654, in close
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: no element found: line 1, column 0

What's happening here? Is this a bug?

jkitchin commented 8 years ago

It is a bug for python 2.7 I think. It works in Python 3 ok.

+BEGIN_SRC python :results output org drawer

from scopus.scopus_api import ScopusAbstract

ab = ScopusAbstract("2-s2.0-84930616647", refresh=True) print(ab)

+END_SRC

+RESULTS:

:RESULTS: [[http://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=84930616647&origin=inward][2-s2.0-84930616647]] John R. Kitchin, Examples of effective data sharing in scientific publishing, ACS Catalysis, 5(6), p. 3894-3899, (2015). http://dx.doi.org/10.1021/acscatal.5b00538, http://www.scopus.com/inward/record.url?partnerID=HzOxMe3b&scp=84930616647&origin=inward, cited 0 times (Scopus). Affiliations: id:60027950 Carnegie Mellon University :END:

What is the output of this command:

cat ~/.scopus/xml/2-s2.0-84930616647

I updated this to be Py3 compatible somewhat recently, and I suspect there are unicode issues now with Py2.7

I get a different error when I run this on Py2,7, related to codec encoding errors.

jkitchin commented 8 years ago

Try a new pull, I think I added an encode on the file write that might fix it.

Michael-E-Rose commented 8 years ago

Hmm, no changes, but I remember to have had a decoding error too. So the resulting error is not consistent, strangely.

cat ~/.scopus/xml/2-s2.0-84930616647 does nothing, however. This means the file is not present, does it?

jkitchin commented 8 years ago

It probably means the file is empty, because of a file write error, which is why you get the error about no element found. I made another push that works on my py2.7 linux box. It only fixes the abstract example though.

jkitchin commented 8 years ago

How are you running this where you don't see the errors?

Michael-E-Rose commented 8 years ago

Still unchanged, but I can reproduce the errors better:

If the xml file doesn't exist:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_api.py", line 143, in __init__
    f.write(self.xml)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 2001: ordinal not in range(128)

Despite the error an empty file is being created in ~/.scopus/xml/. Trying to do the abstract command again (even with refresh=True):

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_api.py", line 133, in __init__
    results = ET.fromstring(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1301, in XML
    return parser.close()
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1654, in close
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: no element found: line 1, column 0
jkitchin commented 8 years ago

This makes me think you have not pulled the latest code.

Check line 130 of scopus_api.py. It should match:

https://github.com/jkitchin/scopus/blob/master/scopus/scopus_api.py#L130

Michael E. Rose writes:

Still unchanged, but I can reproduce the errors better:

If the xml file doesn't exist:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_api.py", line 143, in __init__
    f.write(self.xml)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xa9' in position 2001: ordinal not in range(128)

Despite the error an empty file is being created in ~/.scopus/xml/. Trying to do the abstract command again (even with refresh=True):

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_api.py", line 133, in __init__
    results = ET.fromstring(text)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1301, in XML
    return parser.close()
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1654, in close
    self._raiseerror(v)
  File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror
    raise err
xml.etree.ElementTree.ParseError: no element found: line 1, column 0

You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/jkitchin/scopus/issues/1#issuecomment-238944421

Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu

Michael-E-Rose commented 8 years ago

Indeed I was on an old version (forget the --upgrade flag). I tested every example with the new version.

While most examples work, few problems remain:

scopus_search

from scopus.scopus_search import ScopusSearch

s = ScopusSearch('FIRSTAUTH ( kitchin  j.r. )', refresh=True)
print(s.org_summary)

yields

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_search.py", line 114, in org_summary
    s += '{0}. {1}\n'.format(i + 1, abstract)
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_api.py", line 260, in __str__
    for a in self.authors[0:-1]])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in position 1: ordinal not in range(128)

scopus_author

from scopus.scopus_author import ScopusAuthor

au = ScopusAuthor(7004212771)
print([a.name for a in au.get_coauthors()])

yields

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_author.py", line 274, in get_coauthors
    coauthor_name = '{0} {1}'.format(given_name, surname)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc1' in position 0: ordinal not in range(128)
from scopus.scopus_author import ScopusAuthor

au = ScopusAuthor(7004212771)
print(au)

yields

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_author.py", line 418, in __str__
    for aff in self.affiliation_history:
AttributeError: 'ScopusAuthor' object has no attribute 'affiliation_history'

scopus_report

from scopus.scopus_search import ScopusSearch
from scopus.scopus_reports import report

s = ScopusSearch('FIRSTAUTH ( kitchin  j.r. )')
report(s, 'Kitchin - first author')

yields (omitting the working output)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_reports.py", line 91, in report
    for cat in ScopusAuthor(scopus_id).categories[0:3]])
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_author.py", line 136, in __init__
    f.write(resp.text)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 18711: ordinal not in range(128)
jkitchin commented 8 years ago

Thanks for the notes. I have pushed a few updates this morning that seem to have addressed these.

Michael E. Rose writes:

Indeed I was on an old version (forget the --upgrade flag). I tested every example with the new version.

While most examples work, few problems remain:

scopus_search

from scopus.scopus_search import ScopusSearch

s = ScopusSearch('FIRSTAUTH ( kitchin  j.r. )', refresh=True)
print(s.org_summary)

yields

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_search.py", line 114, in org_summary
    s += '{0}. {1}\n'.format(i + 1, abstract)
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_api.py", line 260, in __str__
    for a in self.authors[0:-1]])
UnicodeEncodeError: 'ascii' codec can't encode character u'\xf8' in position 1: ordinal not in range(128)

scopus_author

from scopus.scopus_author import ScopusAuthor

au = ScopusAuthor(7004212771)
print([a.name for a in au.get_coauthors()])

yields

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_author.py", line 274, in get_coauthors
    coauthor_name = '{0} {1}'.format(given_name, surname)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xc1' in position 0: ordinal not in range(128)
from scopus.scopus_author import ScopusAuthor

au = ScopusAuthor(7004212771)
print(au)

yields

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_author.py", line 418, in __str__
    for aff in self.affiliation_history:
AttributeError: 'ScopusAuthor' object has no attribute 'affiliation_history'

scopus_report

from scopus.scopus_search import ScopusSearch
from scopus.scopus_reports import report

s = ScopusSearch('FIRSTAUTH ( kitchin  j.r. )')
report(s, 'Kitchin - first author')

yields (omitting the working output)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_reports.py", line 91, in report
    for cat in ScopusAuthor(scopus_id).categories[0:3]])
  File "/usr/local/lib/python2.7/dist-packages/scopus/scopus_author.py", line 136, in __init__
    f.write(resp.text)
UnicodeEncodeError: 'ascii' codec can't encode character u'\xae' in position 18711: ordinal not in range(128)

Professor John Kitchin Doherty Hall A207F Department of Chemical Engineering Carnegie Mellon University Pittsburgh, PA 15213 412-268-7803 @johnkitchin http://kitchingroup.cheme.cmu.edu

Michael-E-Rose commented 8 years ago

Every example now works, great! Thanks @jkitchin!