gwu-libraries / launchpad

A django based system that provides a stable URL for every item in the library's catalog. Various discovery services will link to these URLs. The page for each item will in turn link out to various other resources that provide methods for accessing the content of the items.
MIT License
15 stars 9 forks source link

Unicode Error #1210

Open kerchner opened 6 years ago

kerchner commented 6 years ago

It's not clear whether this is an error with the data for a particular item, or whether a certain query triggered this error, but it warrants looking into. Initial checking observed that

The query referenced in the error appears to be collección.

Internal Server Error: /item/12344364.json
Traceback (most recent call last):
  File "/launchpad/current/launchpad/ENV/lib/python2.7/site-packages/django/core/handlers/base.py", line 132, in get_response
    response = wrapped_callback(request, *callback_args, **callback_kwargs)
  File "/launchpad/current/launchpad/ENV/lib/python2.7/site-packages/django/utils/decorators.py", line 110, in _wrapped_view
    response = view_func(request, *args, **kwargs)
  File "/launchpad/current/launchpad/lp/ui/views.py", line 135, in item_json
    bib_data = voyager.get_bib_data(bibid)
  File "/launchpad/current/launchpad/lp/ui/voyager.py", line 199, in get_bib_data
    bib.get('TITLE', ''))
  File "/launchpad/current/launchpad/lp/ui/voyager.py", line 344, in get_related_bibids
    results = _make_dict(cursor)
  File "/launchpad/current/launchpad/lp/ui/voyager.py", line 33, in _make_dict
    for row in cursor.fetchall()
  File "/launchpad/current/launchpad/ENV/lib/python2.7/site-packages/django/db/utils.py", line 105, in inner
    return func(*args, **kwargs)
  File "/launchpad/current/launchpad/ENV/lib/python2.7/site-packages/django/db/backends/oracle/base.py", line 517, in fetchall
    return tuple(_rowfactory(r, self.cursor) for r in self.cursor.fetchall())
  File "/launchpad/current/launchpad/ENV/lib/python2.7/site-packages/django/db/backends/oracle/base.py", line 517, in <genexpr>
    return tuple(_rowfactory(r, self.cursor) for r in self.cursor.fetchall())
  File "/launchpad/current/launchpad/ENV/lib/python2.7/site-packages/django/db/backends/oracle/base.py", line 599, in _rowfactory
    value = to_unicode(value)
  File "/launchpad/current/launchpad/ENV/lib/python2.7/site-packages/django/db/backends/oracle/base.py", line 610, in to_unicode
    return force_text(s)
  File "/launchpad/current/launchpad/ENV/lib/python2.7/site-packages/django/utils/encoding.py", line 102, in force_text
    raise DjangoUnicodeDecodeError(s, *e.args)
DjangoUnicodeDecodeError: 'utf8' codec can't decode byte 0xf3 in position 19: unexpected end of data. You passed in '9562440710 (colecci\xf3n)' (<type 'str'>)

Request repr():
<WSGIRequest
path:/item/12344364.json,
GET:<QueryDict: {}>,
POST:<QueryDict: {}>,
COOKIES:{},
META:{'CONTEXT_DOCUMENT_ROOT': '/var/www',
 'CONTEXT_PREFIX': '',
 u'CSRF_COOKIE': u'f0G9gMwAp0tVlBuYpzlx7y1gy9JOY7Cs',
 'DOCUMENT_ROOT': '/var/www',
 'GATEWAY_INTERFACE': 'CGI/1.1',
 'HTTP_ACCEPT': '*/*',
 'HTTP_ACCEPT_ENCODING': 'gzip,deflate',
 'HTTP_CONNECTION': 'Keep-Alive',
 'HTTP_FROM': 'support@search.yandex.ru',
 'HTTP_HOST': 'findit.library.gwu.edu',
 'HTTP_USER_AGENT': 'Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)',
 'PATH_INFO': u'/item/12344364.json',
 'PATH_TRANSLATED': '/launchpad/current/launchpad/lp/lp/wsgi.py/item/12344364.json',
 'QUERY_STRING': '',
 'REMOTE_ADDR': '5.255.250.63',
 'REMOTE_PORT': '36230',
 'REQUEST_METHOD': 'GET',
 'REQUEST_SCHEME': 'https',
 'REQUEST_URI': '/item/12344364.json',
 'SCRIPT_FILENAME': '/launchpad/current/launchpad/lp/lp/wsgi.py',
 'SCRIPT_NAME': u'',
 'SERVER_ADDR': '192.245.136.25',
 'SERVER_ADMIN': 'gwlib-root@groups.gwu.eduu',
 'SERVER_NAME': 'findit.library.gwu.edu',
 'SERVER_PORT': '443',
 'SERVER_PROTOCOL': 'HTTP/1.1',
 'SERVER_SIGNATURE': '<address>Apache/2.4.7 (Ubuntu) Server at findit.library.gwu.edu Port 443</address>\n',
 'SERVER_SOFTWARE': 'Apache/2.4.7 (Ubuntu)',
 'SSL_TLS_SNI': 'findit.library.gwu.edu',
 'force-proxy-request-1.0': '1',
 'mod_wsgi.application_group': 'findit.library.gwu.edu|',
 'mod_wsgi.callable_object': 'application',
 'mod_wsgi.enable_sendfile': '0',
 'mod_wsgi.handler_script': '',
 'mod_wsgi.input_chunked': '0',
 'mod_wsgi.listener_host': '',
 'mod_wsgi.listener_port': '443',
 'mod_wsgi.process_group': 'findit.library.gwu.edu',
 'mod_wsgi.queue_start': '1511504665230282',
 'mod_wsgi.request_handler': 'wsgi-script',
 'mod_wsgi.script_reloading': '1',
 'mod_wsgi.version': (3, 4),
 'proxy-nokeepalive': '1',
 'wsgi.errors': <mod_wsgi.Log object at 0x7f0d131969b0>,
 'wsgi.file_wrapper': <built-in method file_wrapper of mod_wsgi.Adapter object at 0x7f0d13280378>,
 'wsgi.input': <mod_wsgi.Input object at 0x7f0d20047670>,
 'wsgi.multiprocess': True,
 'wsgi.multithread': True,
 'wsgi.run_once': False,
 'wsgi.url_scheme': 'https',
 'wsgi.version': (1, 0)}>
lwrubel commented 6 years ago

Those errors are usually caused by Unicode errors in Georgetown records which were imported into WRLC Voyager. The problem often is visible in the WRLC Catalog also.

It's possible to manually correct them (Mike would occasionally do this), but it's a longstanding issue. It's possible with the exports they're having to do from Sierra for Alma that they've found some better options for avoiding this, I'm not sure.