rufuspollock-okfn / bibserver

BibServer is open-source software what makes it easy to publish, manage and find bibliographies. BibServer is RESTful and web-friendly.
MIT License
126 stars 34 forks source link

index error in BibTexParser.getnames #172

Closed pitman closed 12 years ago

pitman commented 12 years ago

line 295 of BibTexParser throws an index error for an empty name string. Protecting the

last = namesplit.pop()

by

try: last = namesplit.pop() except: last = ''

seems to eliminate the bug.

epoz commented 12 years ago

Can you please post a minimal example BibTex that triggers the error? I tried it with the following, and it did not throw an exception:

@misc{arXiv:0807.3308,
    title = {Visibility to infinity in the hyperbolic plane, despite obstacles},
    author = {},
}

(which I presume is what you meant with an empty name string?)

pitman commented 12 years ago

Etienne Posthumus reply@reply.github.com wrote:

Can you please post a minimal example BibText that triggers the error?

@article{srising66, author="H. M. Srivastava and ", title="{zzz}", journal="zzz", volume=zzz, pages="zzz", year=zzz}

Related to finding such bugs, but also for other purposes, please can you modify the BibTeXparser so it returns a key-value with key = "bibtex" and value as close as possible to the input BibTeX, modulo character conversions. In particular, the accents should be left BibTeX encoded. This is useful for many purposes, and was a feature of the original BibTeX parser I wrote, which has been lost.

I suggest this as a general rule for parsers: always preserve the original record as faithfully as possible as a value in the JSON output, or at least have a simple option to do that. This should greatly reduce time wasted finding records which throw errors in parsing, and help users to easily correct such records.

A suggestion about error handling in parsers: I suggest comments about errors found in parsing should also be put into the JSON in a systematic way, as a field called e.g. "parser_error" or similar. This seems like better practice than sending errors to another outlet, which may not be easily available.

I think we need some sort of collab doc space for working on these parsers. I have a lot of practical experience from running parsers on dirty data which I would like to express and see incorporated into our parser collection, just not sure how best to do that.

many thanks

--Jim