jonathansick / ads_bibdesk

(Unmaintained) Mac OS X service for frictionless import of NASA ADS and arXiv publications into BibDesk.
GNU General Public License v3.0
37 stars 20 forks source link

UnicodeDecodeError with recent MNRAS articles #29

Closed will-henney closed 11 years ago

will-henney commented 11 years ago

Hi

I get errors like the following with some 2013 MNRAS papers:

$ adsbibdesk -d 2013MNRAS.428..307F
Starting ADS to BibDesk
ADS to BibDesk version 3.1.1
Python: 2.7.3 | 64-bit | (default, Mar 25 2013, 15:52:02) 
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
process_token found article token 2013MNRAS.428..307F
ADSConnector found bibcode/DOI 2013MNRAS.428..307F
ADSHTMLParser found links: {u'ar': u'http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2013MNRAS.428..307F&link_type=AR&db_key=AST',
 u'article': u'http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2013MNRAS.428..307F&link_type=ARTICLE&db_key=AST&high=',
 u'bibtex': u'http://adsabs.harvard.edu/cgi-bin/nph-bib_query?bibcode=2013MNRAS.428..307F&data_type=BIBTEX&db_key=AST&nocookieset=1',
 u'citations': u'http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2013MNRAS.428..307F&link_type=CITATIONS&db_key=AST',
 u'custom': u'http://adsabs.harvard.edu/cgi-bin/nph-abs_connect?bibcode=2013MNRAS.428..307F&data_type=Custom&format=%5c%5cbibitem%5b%25%5c2m%25%28y%29%5d%25%7bR%7d%20%25%5c5.3l%20%25%5cY,%25%5cj,%25%5cV,%25%5cp%5cn&return_fmt=LONG&db_key=AST&nocookieset=1',
 u'ejournal': u'http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2013MNRAS.428..307F&link_type=EJOURNAL&db_key=AST&high=',
 u'preprint': u'http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2013MNRAS.428..307F&link_type=PREPRINT&db_key=AST',
 u'refcit': u'http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2013MNRAS.428..307F&link_type=REFCIT&db_key=AST&high=',
 u'references': u'http://adsabs.harvard.edu/cgi-bin/nph-data_query?bibcode=2013MNRAS.428..307F&link_type=REFERENCES&db_key=AST&high='}
Traceback (most recent call last):
  File "/Users/will/Library/Enthought/Canopy_64bit/User/bin/adsbibdesk", line 9, in <module>
    load_entry_point('adsbibdesk==3.1.1', 'console_scripts', 'adsbibdesk')()
  File "/Users/will/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/adsbibdesk.py", line 175, in main
    process_articles(args, prefs)
  File "/Users/will/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/adsbibdesk.py", line 192, in process_articles
    process_token(articleToken, prefs, bibdesk)
  File "/Users/will/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/adsbibdesk.py", line 245, in process_token
    pdf = ads.getPDF()
  File "/Users/will/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/adsbibdesk.py", line 1066, in getPDF
    parser.parse(url)
  File "/Users/will/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/adsbibdesk.py", line 1275, in parse
    self.feed(urllib2.urlopen(url).read())
  File "/Applications/Canopy.app/appdata/canopy-1.0.0.1160.macosx-x86_64/Canopy.app/Contents/lib/python2.7/HTMLParser.py", line 114, in feed
    self.goahead(0)
  File "/Applications/Canopy.app/appdata/canopy-1.0.0.1160.macosx-x86_64/Canopy.app/Contents/lib/python2.7/HTMLParser.py", line 158, in goahead
    k = self.parse_starttag(i)
  File "/Applications/Canopy.app/appdata/canopy-1.0.0.1160.macosx-x86_64/Canopy.app/Contents/lib/python2.7/HTMLParser.py", line 305, in parse_starttag
    attrvalue = self.unescape(attrvalue)
  File "/Applications/Canopy.app/appdata/canopy-1.0.0.1160.macosx-x86_64/Canopy.app/Contents/lib/python2.7/HTMLParser.py", line 472, in unescape
    return re.sub(r"&(#?[xX]?(?:[0-9a-fA-F]+|\w{1,8}));", replaceEntities, s)
  File "/Applications/Canopy.app/appdata/canopy-1.0.0.1160.macosx-x86_64/Canopy.app/Contents/lib/python2.7/re.py", line 151, in sub
    return _compile(pattern, flags).sub(repl, string, count)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 28: ordinal not in range(128)

Any idea what the problem is?

Cheers

Will

jonathansick commented 11 years ago

You're right. MNRAS support is broken. I'll work on this this afternoon. Thanks for the report.