sile-typesetter / sile

The SILE Typesetter — Simon’s Improved Layout Engine
https://sile-typesetter.org
MIT License
1.61k stars 97 forks source link

Correct handing of ampersands in bibliography entries #2050

Closed Omikhleia closed 1 week ago

Omikhleia commented 3 weeks ago

For strings such as On some stuff & other things, we currently have to format our bibtex files as follows for use with SILE:

@book{key,
  title = {On some stuff & other things},
...

If we don't XML-escape the &, we get an error...

This is dues to SILE supporting XML entries in bibliography, which is non-standard... albeit interesting, e.g. if one wants to markup parts of entries in SIL XML. (The true use case however is not the user inserting markup, it's for the internal logic of rendering titles in italic, etc.)

It's not completely obvious, as:

Having to manually edit bibliographies to replace & by & is cumbersome, we'd need to avoid it, or at least have some way to bypass it. (I'd also be interested in supporting Djot/Markdown in bibTeX files, but that's another hornet's nest :p )

Omikhleia commented 3 weeks ago

Slightly relates to #1860 (as minimal TeX-like stuff people might expect in a bibTeX file, the \& and ~ are maybe both common enough to be properly handled).

Omikhleia commented 3 weeks ago

Slightly relates to #1860 (as minimal TeX-like stuff people might expect in a bibTeX file, the \& and ~ are maybe both common enough to be properly handled).

Along the same line of thinking, - vs. -- in page ranges might need to be checked for consistency (also as argument to \cite)

alerque commented 3 weeks ago

Having to manually edit bibliographies to replace & by & is cumbersome, we'd need to avoid it, or at least have some way to bypass it.

This is definitely not something we should expect to be in the input, we need to apply XML character escaping ourselves.

I'm not familiar with other issues with inputs, but if TeX-isms like \& and ~ are standard we need to decode those too, or if they are common but not standard maybe we need an optional setting for handling them or not on loading bibliographies. Perhaps an argument to the loaded or a setting for whether the input is expected to be plain, XML, SIL, TeX, Markdown, or whatever is in order. Defaulting to plain or whatever is standard or most common.

Omikhleia commented 3 weeks ago

Perhaps an argument to the loaded or a setting for whether the input is expected to be plain, ... or whatever

Food for thought:

The crux of the matter is that the bibtex format was design with TeX in mind, hence it cannot be made completely portable. (The original need to escape \&, I'd guess, came from & being an active character in TeX for arrays...).

I think that the safest approach (to start with) is to consider by default that the input does not contain any markup. (We are not going to be able to support TeX/LaTeX, or @preamble blocks with TeX-like instructions, anyway).

IMHO, the best course of action is to assume the bib file is self-defined, written in a minimal "portable" subset, i.e. not containing any TeX, XML, SIL or whatever constructs, exception made of the really common ones (--, \& and ~).

FWIW, as of other most "common" input issues (I might comment on them separately at some point) are likely:

Omikhleia commented 3 weeks ago

We are not going to be able to support TeX/LaTeX

BTW, For the record, Typst does support some minimal interpretation of TeX-like input.

The problem I see there is that we'll never know what's really minimal...