msprev / fzf-bibtex

a BibTeX source for fzf
BSD 3-Clause "New" or "Revised" License
129 stars 15 forks source link

Parsing Error when @ Symbol Found Inside An Item's Fields #3

Closed mdko closed 5 years ago

mdko commented 5 years ago

When I run bibtex-ls ref.bib, I get the following:

panic: runtime error: index out of range

goroutine 1 [running]:
github.com/msprev/fzf-bibtex/bibtex.parseEntry(0xc000438240, 0x1a1, 0xc000438240)
    /Users/michael/Go/src/github.com/msprev/fzf-bibtex/bibtex/bibtex.go:30 +0x449
github.com/msprev/fzf-bibtex/bibtex.Parse(0xc000078d68, 0x7ffeefbff832, 0x39, 0x11156c0, 0x11156f0)
    /Users/michael/Go/src/github.com/msprev/fzf-bibtex/bibtex/bibtex.go:17 +0x12f
github.com/msprev/fzf-bibtex/cache.RefreshAndDo(0xc000010047, 0x31, 0x7ffeefbff832, 0x39, 0x110d693, 0x3, 0x11156c0, 0x11156f0)
    /Users/michael/Go/src/github.com/msprev/fzf-bibtex/cache/cache.go:75 +0x188
github.com/msprev/fzf-bibtex/cache.ReadAndDo(0xc000010047, 0x31, 0x7ffeefbff832, 0x39, 0x110d693, 0x3, 0x11156c0, 0x11156f0)
    /Users/michael/Go/src/github.com/msprev/fzf-bibtex/cache/cache.go:87 +0x497
main.ls(0xc000010047, 0x31, 0x7ffeefbff832, 0x39)
    /Users/michael/Go/src/github.com/msprev/fzf-bibtex/cmd/bibtex-ls/main.go:31 +0x76
main.main()
    /Users/michael/Go/src/github.com/msprev/fzf-bibtex/cmd/bibtex-ls/main.go:23 +0xa5

Looking at bibtex.go it fails on line 30:

m["key"] = sl[1][:len(sl[1])-1] // remove last character ','

because sl is an array of size 1. The reason for this is because in line 15 it splits my entire ref.bib file based on the @ symbol, which prefixes each top-level reference item.

sl := strings.Split(bibtexStr, "@")[1:]

Error Cause The error comes when the file contains the @ symbol anywhere other than in the top-level @referencetype{key, location, causing it to parse incorrectly. For example, I have a reference that contains the @ symbol in the title.

I know the chances of having the @ symbol somewhere other than at the start of each item is probably low, but I wanted to report this error in case someone else encounters a similar problem.

msprev commented 5 years ago

Thanks for the great diagnosis! I've pushed a fix now that splits records only when @ comes after a newline character. Let me know if there are any further issues.

As general background, as you can tell from the code, the bibtex file is parsed based on (a) bibtool's reformatting of a user's .bib file into bibtex data with a canonical layout; (b) heuristics to extract the relevant data from that canonical layout. This is intentional design decision. fzf-bibtex prioritises speed and responsiveness even if it means choking on rare .bib files. Full-blown bibtex parsing is slow and hard to make 100% reliable. Using heuristics allow fzf-bibtex to chew through enormous .bib files very quickly, so users are not kept waiting. (By way of contrast, unite-bibtex did full bibtex parsing but was slow with large files).

I'm happy to tweak the heuristics to deal with common errors in parsing the data (as in this case). Thank you for this.

mdko commented 5 years ago

Thanks so much for this tool! And the quick response! It definitely solves it.

Yeah, it makes sense to avoid full-blown parsing of rare .bib files. My .bib file is 25000 lines long and only had the @ symbol outside of the top-level in one single place, so it's definitely rare.

msprev commented 5 years ago

Thanks! My .bib file is 100k lines long and growing... There are some optimisations regarding concurrency I could experiment with to boost speed further, but it's pretty much instantaneous now even on a slow machine, so I haven't felt the need yet.

dloewenstein commented 5 years ago

Hi, I get this same error. Something in the below part of a bibtex entry is causing it but I can't figure out what. After I delete enough authors it works again.

author = {Brignole, Michele and Auricchio, Angelo and Baron-Esquivias, Gonzalo and Bordachar, Pierre and Boriani, Giuseppe and Breithardt, Ole-A and Cleland, John and Deharo, Jean-Claude and Delgado, Victoria and Elliott, Perry M. and Gorenek, Bulent and Israel, Carsten W. and Leclercq, Christophe and Linde, Cecilia and Mont, Llu\'is and Padeletti, Luigi and Sutton, Richard and Vardas, Panos E. and Zamorano, Jose Luis and Achenbach, Stephan and Baumgartner, Helmut and Bax, Jeroen J. and Bueno, H\'ector and Dean, Veronica and Deaton, Christi and Erol, Cetin and Fagard, Robert and Ferrari, Roberto and Hasdai, David and Hoes, Arno W. and Kirchhof, Paulus and Knuuti, Juhani and Kolh, Philippe and Lancellotti, Patrizio and Linhart, Ales and Nihoyannopoulos, Petros and Piepoli, Massimo F. and Ponikowski, Piotr and Sirnes, Per Anton and Tamargo, Juan Luis and Tendera, Michal and Torbicki, Adam and Wijns, William and Windecker, Stephan and Kirchhof, Paulus and Blomstrom-Lundqvist, Carina and Badano, Luigi P. and Aliyev, Farid and B\"ansch, Dietmar and Baumgartner, Helmut and Bsata, Walid and Buser, Peter and Charron, Philippe and Daubert, Jean-Claude and Dobreanu, Dan and Faerestrand, Svein and Hasdai, David and Hoes, Arno W. and Le Heuzey, Jean-Yves and Mavrakis, Hercules and McDonagh, Theresa and Merino, Jose Luis and Nawar, Mostapha M. and Nielsen, Jens Cosedis and Pieske, Burkert and Poposka, Lidija and Ruschitzka, Frank and Tendera, Michal and Van Gelder, Isabelle C. and Wilson, Carol M.},

msprev commented 5 years ago

Can you post the full BibTeX entry and the error message as a new issue so I can test?