mcmtroffaes / sphinxcontrib-bibtex

Sphinx extension for bibtex style references.
Other
180 stars 45 forks source link

When `cite` is used inside an `only` directive, the entry in the `bibliography` is present regardless of the `only` condition #349

Open raek opened 2 months ago

raek commented 2 months ago

Description

At work we use the only directive to produce multiple document variants from a single document source. We discovered that if a source was cited inside an only directive, then the bibliography entry was always present, regardless whether the citation reference was included or not.

Minimal example:

Some paragraph.

.. only:: april_fools

    This paragraph only appears in some configurations :cite:p:`1987:nelson`.

Some other paragraph

.. bibliography:: refs.bib

The expected behavior is for "1987:nelson" to not appear in the bibliography when the "april_fools" tag is not set, but now it is always included.

Analysis

The sphinxcontrib.bibtex.domain.BibtexDomain.env_updated method is responsible for pruning bibliography entries that are not referenced in the document. It runs after any registered transform has been applied but before any post-transform is applied.

The conditional rendering of the only directive is applied by the sphinx.transforms.post_transforms.OnlyNodeTransform post-transform. As this is applied after BibtexDomain.env_updated has already run, the bilbiography already contains an entry which will "survive".

Workaround

I developed a workaround that solves our immediate problem, but it would be interesting to discuss how this problem could be fixed properly (if it is considered a real bug). Do you have any ideas? This workaround is what I've come up with so far.

import docutils.nodes
from sphinx.transforms.post_transforms import (
    SphinxPostTransform, OnlyNodeTransform)
from sphinxcontrib.bibtex.transforms import BibliographyTransform

class PruneCitationPostTransform(SphinxPostTransform):
    default_priority = max(BibliographyTransform.default_priority,
                           OnlyNodeTransform.default_priority) + 1

    def run(self, **kwargs) -> None:
        env = self.document.settings.env
        domain = env.get_domain("cite")
        found_ids = set()
        for node in self.document.traverse(docutils.nodes.Element):
            found_ids.update(node["ids"])
        for bibliography in domain.bibliographies.values():
            for citation_node in bibliography.citation_nodes.values():
                if not citation_node.children:
                    # This citation was already pruned in
                    # BibtexDomain.env_updated
                    continue
                if not (set(citation_node["backrefs"]) & found_ids):
                    # All references to this citations has been
                    # removed by an Only directive. Prune it!
                    citation_node.replace_self(docutils.nodes.comment())

def setup(app):
    app.add_post_transform(PruneCitationPostTransform)
mcmtroffaes commented 1 month ago

Apologies for the late response. I very much appreciate you sharing your code here. I see the problem and the transform would work, but... there's two catches:

  1. Unless this somehow has been fixed recently, I've observed that Sphinx does not produce backrefs across documents. So I suspect your workaround will only work if the bibliography directive resides in the same document as where all the citations are made.
  2. The workaround assumes the bibliography has the :cited: option.

A more comprehensive solution would handle this somehow via the citation_refs data (i.e. to track whether a citation ref is in an inactive only block). Then env_updated can do the right thing, and possibly also already remove the hidden reference right there and then - I suspect sphinx will try to resolve them before triggering .. only.

raek commented 1 month ago

Thanks for the feedback!

Regarding point two: yes, we use :cited: for all the documents in question here. I think I could update the code to only perform this transformation for bibliographies with the :cited: option and leave the others unchanged. This feels like a necessary change.

I've seen some comments in the code in this repo (I think?) regarding multiple documents. We have tended to use single documents so I haven't thought about this case much. A proper fix should work for multiple documents too. I will think about it some more and do some experiments.