Open minrk opened 2 months ago
Thank you for tracking down this regression!
In that release we started generating bibtex from CSL-JSON using citation-js
, rather than just copying in the raw source bibtex. This solution was more generic and allowed us to support citations (e.g. from DOIs) that did not have raw bibtex available. However, it has led to some issues, since CSL-JSON (at least as implemented in citation-js
) is lossy and incomplete, compared to relatively permissive and feature-rich bibtex, e.g. see: https://github.com/jupyter-book/mystmd/issues/1284
I'm not quite sure the right approach to address this. We could return to persisting raw bibtex, if available, and only generating bibtex if raw is not available. The drawbacks of this are: (1) Raw bibtex is only available on a private field hidden away in the citation-js
api; accessing it feels a little shaky. (2) It's never nice to maintain two ways of doing the same thing. (3) Sometimes we need to modify bibtex ids, e.g. if there are duplicates; with raw bibtex, this becomes fragile string manipulation rather than simply updating structured data.
The other option is improve the bibtex rendering coming out of citation-js
. To address the specific issue around escaped characters, we could maybe just escape fields before we call format
here https://github.com/jupyter-book/mystmd/blob/main/packages/citation-js-utils/src/index.ts#L327 ...? Or we may need our own CSL -> bibtex
rendering outside of citation-js
... This could take advantage of other bibtex
js libraries, there are a ton, but it's hard to know what's good...
Thanks for the pointer. This is easy to reproduce as an upstream bug in citation-js, so we can hope it gets handled there: https://github.com/citation-js/citation-js/issues/232
They do have some formatting code for bibtex export, so it seems handling this is in-scope for citation-js already, it just hasn't come up yet.
If a workaround is appropriate, I suppose mystmd could apply some of its own escaping to the CSL before passing it to the bibtex exporter, assuming it won't double-escape (at least with a pinned version). I don't know how robust that can be, though.
https://github.com/citation-js/citation-js/issues/232 is fixed upstream, so next update should close this particular issue.
Thanks @minrk for following this upstream. :)
Description
Given the .bib entry:
building tex/pdf with
myst build --tex
orpdf
generates the bibtex entry inexports/tex/main.bib
:resulting in errors like: "Misplaced alignment tab character &." in the latex output.
Running a search through
npx mystmd@$version
suggests that this is a regression inmystmd@1.1.53
:produces the right output, while
strips the escape characters.
Proposed solution
preserve characters like
\&
in bibliography fieldsAdditional notes
this happens with mystmd@1.1.53 and mystmd@1.3.3, but not mystmd@1.1.52.