inspirehep / refextract

Extract bibliographic references from (High-Energy Physics) articles.
GNU General Public License v2.0
129 stars 30 forks source link

non-ascii journal titles #25

Closed kaplun closed 7 years ago

kaplun commented 7 years ago

https://sentry.cern.ch/inspire-sentry/inspire-labs/group/820653/

michamos commented 7 years ago

this is not a refextract bug, the default format string was changed to unicode in #16, but inspire-next uses a different one that should be fixed as part of inspirehep/inspire-next#2133.

michamos commented 7 years ago

@chris-asl the problem is here https://github.com/inspirehep/inspire-next/blob/master/inspirehep/modules/refextract/tasks.py#L90

kaplun commented 7 years ago

Are you sure? The exception is triggered deep down in refextract code base.

michamos commented 7 years ago

Yes, the format string gets passed to refextract and consumed deep inside (one of refextract great design decisions).

kaplun commented 7 years ago

:scream: