joostkremers / parsebib

Elisp library for reading .bib files
BSD 3-Clause "New" or "Revised" License
35 stars 9 forks source link

Unmatched full-width braces lead to "Unbalanced parentheses" errors #32

Open gudzpoz opened 8 months ago

gudzpoz commented 8 months ago

(I am using Spacemacs and have tons of customization, but I assume this is irrelevant? If the following is not enough to reproduce the issue, I will try again in a fresh install / vanilla emacs.)

Reproducing Steps

% Failing.bib
@article{Title,
  title = {{Title}},
  author = {Author},
  year = {1970},
  journal = {Journal},
  abstract = {(} % <-- Culprit
}
% Passing.bib
@article{Title,
  title = {{Title}},
  author = {Author},
  year = {1970},
  journal = {Journal},
  abstract = {()} % <-- No error
}

Evaluation results:

(parsebib-parse "/tmp/Passing.bib")
#s(hash-table size 65 test equal rehash-size 1.5 rehash-threshold 0.8125 data ("Title" (("abstract" . "()<–Noerror") ("journal" . "Journal") ("year" . "1970") ("author" . "Author") ("title" . "Title") ("=type=" . "article") ("=key=" . "Title"))))

(parsebib-parse "/tmp/Failing.bib")
Debugger entered--Lisp error: (scan-error "Unbalanced parentheses" 23 145)
  scan-sexps(23 1)
  forward-sexp(1)
  parsebib--match-brace-forward()
  parsebib--match-paren-forward()
  parsebib-read-entry("article" nil #<hash-table equal 0/65 0x15659cd2c74d> nil t)
  parsebib-parse-bib-buffer(:entries #<hash-table equal 0/65 0x15659cd2c72d> :strings #<hash-table equal 0/65 0x15659cd2c74d> :expand-strings t :inheritance t :fields nil :replace-TeX t)
  #f(compiled-function (file) #<bytecode -0x424eb6921cb642f>)("/tmp/Failing.bib")
  parsebib-parse("/tmp/Failing.bib")
  (progn (parsebib-parse "/tmp/Failing.bib"))
  elisp--eval-last-sexp(t)
  #<subr eval-last-sexp>(t)
  #f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>)()
  eval-sexp-fu-flash-doit-simple(#f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>) #f(compiled-function (&rest args2) #<bytecode 0x68db0840b9799c7>) #f(compiled-function (&rest args2) #<bytecode 0x6821c372fab19c7>))
  eval-sexp-fu-flash-doit(#f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>) #f(compiled-function (&rest args2) #<bytecode 0x68db0840b9799c7>) #f(compiled-function (&rest args2) #<bytecode 0x6821c372fab19c7>))
  esf-flash-doit(#f(compiled-function (&rest _it) #<bytecode 0x19dc700cc452>) #f(compiled-function (&rest args2) #<bytecode 0x68db0840b9799c7>) #f(compiled-function (&rest args2) #<bytecode 0x6821c372fab19c7>) #f(compiled-function (&rest args2) #<bytecode 0xa28255960f219d0>))
  ad-Advice-eval-last-sexp(#<subr eval-last-sexp> t)
  apply(ad-Advice-eval-last-sexp #<subr eval-last-sexp> t)
  eval-last-sexp(t)
  eval-print-last-sexp(nil)
  funcall-interactively(eval-print-last-sexp nil)
  command-execute(eval-print-last-sexp)

Expecting behavior

The parser should treat full-width characters as normal text instead of syntactic elements.

P.S. Both Failing.bib and Passing.bib pass validation by biber (via biber --tool -V Failing.bib / biber --tool -V Passing.bib).

joostkremers commented 8 months ago

This is probably the result of parsebib using forward-sexp to find the end of a BibTeX entry, but I'll need to look into it before I can say for sure.

joostkremers commented 8 months ago

Oh, wait a sec. This is not a normal opening parenthesis, it's a CJK character! I hadn't noticed that right away.

You'll notice that if you have an unclosed ASCII parenthesis, it actually works.

This may actually be a bug in Emacs (bibtex.el, to be more precise): parsebib uses the syntax table bibtex-braced-string-syntax-table to during parsing, which turns parentheses () into normal punctuation instead of characters that need to be in pairs, which allows it to ignore any unmatched parentheses in field values. However, the (CJK) fullwidth parentheses don't have their syntax class set to punctuation, so parsebib tries to match them, which in cannot.

So arguably, bibtex-braced-string-syntax-table should deal with non-ASCII parentheses as well, because bibtex-mode can't handle them either. If I open Failing.bib in Emacs and then do C-c C-c, I get the error user-error: Syntactically incorrect BibTeX entry starts here.

joostkremers commented 8 months ago

I've created an Emacs bug here: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=68477

In the mean time, you should be able to add FULLWIDTH LEFT PARENTHESIS and FULLWIDTH RIGHT PARENTHESIS to bibtex-braced-string-syntax-table yourself:

(with-eval-after-load 'bibtex
  (modify-syntax-entry ?\( "." bibtex-braced-string-syntax-table)
  (modify-syntax-entry ?\) "." bibtex-braced-string-syntax-table))

I suspect there will be more non-ASCII parentheses that would need to be added to bibtex-braced-string-syntax-table, so if any of those are problematic for you as well, you can add them in the same way.

This is only a work-around, of course, but it should help you deal with your issue until a new Emacs version is released with the fix.