jupyter-book / mystmd

Command line tools for working with MyST Markdown.
https://mystmd.org/guide
MIT License
206 stars 61 forks source link

tex export strips `\&` from bibliography fields #1460

Open minrk opened 2 months ago

minrk commented 2 months ago

Description

Given the .bib entry:

@article{test,
    author={Last Name, First},
    journal={Computing in Science {\&} Engineering},
    title={Thing \& Other thing},
    year={3048},
    volume={1},
    number={1},
    pages={1-2},
    keywords={},
}

building tex/pdf with myst build --tex or pdf generates the bibtex entry in exports/tex/main.bib:

@article{test,
    author = {Last Name, First},
    journal = {Computing in Science & Engineering},
    number = {1},
    year = {3048},
    pages = {1--2},
    title = {Thing & {Other} thing},
    volume = {1},
}

resulting in errors like: "Misplaced alignment tab character &." in the latex output.

Running a search through npx mystmd@$version suggests that this is a regression in mystmd@1.1.53:

rm -rf _build exports && npx mystmd@1.1.52 build --tex && cat exports/tex/main.bib

produces the right output, while

rm -rf _build exports && npx mystmd@1.1.53 build --tex && cat exports/tex/main.bib

strips the escape characters.

Proposed solution

preserve characters like \& in bibliography fields

Additional notes

this happens with mystmd@1.1.53 and mystmd@1.3.3, but not mystmd@1.1.52.

rowanc1 commented 2 months ago

Thank you for tracking down this regression!

fwkoch commented 2 months ago

In that release we started generating bibtex from CSL-JSON using citation-js, rather than just copying in the raw source bibtex. This solution was more generic and allowed us to support citations (e.g. from DOIs) that did not have raw bibtex available. However, it has led to some issues, since CSL-JSON (at least as implemented in citation-js) is lossy and incomplete, compared to relatively permissive and feature-rich bibtex, e.g. see: https://github.com/jupyter-book/mystmd/issues/1284

I'm not quite sure the right approach to address this. We could return to persisting raw bibtex, if available, and only generating bibtex if raw is not available. The drawbacks of this are: (1) Raw bibtex is only available on a private field hidden away in the citation-js api; accessing it feels a little shaky. (2) It's never nice to maintain two ways of doing the same thing. (3) Sometimes we need to modify bibtex ids, e.g. if there are duplicates; with raw bibtex, this becomes fragile string manipulation rather than simply updating structured data.

The other option is improve the bibtex rendering coming out of citation-js. To address the specific issue around escaped characters, we could maybe just escape fields before we call format here https://github.com/jupyter-book/mystmd/blob/main/packages/citation-js-utils/src/index.ts#L327 ...? Or we may need our own CSL -> bibtex rendering outside of citation-js... This could take advantage of other bibtex js libraries, there are a ton, but it's hard to know what's good...

minrk commented 2 months ago

Thanks for the pointer. This is easy to reproduce as an upstream bug in citation-js, so we can hope it gets handled there: https://github.com/citation-js/citation-js/issues/232

They do have some formatting code for bibtex export, so it seems handling this is in-scope for citation-js already, it just hasn't come up yet.

If a workaround is appropriate, I suppose mystmd could apply some of its own escaping to the CSL before passing it to the bibtex exporter, assuming it won't double-escape (at least with a pinned version). I don't know how robust that can be, though.

minrk commented 2 months ago

https://github.com/citation-js/citation-js/issues/232 is fixed upstream, so next update should close this particular issue.

rowanc1 commented 2 months ago

Thanks @minrk for following this upstream. :)