sphinx-doc / sphinx

The Sphinx documentation generator
https://www.sphinx-doc.org/
Other
6.53k stars 2.12k forks source link

Duplicate emission of HTML meta tags generated by docutils #11699

Closed jayaddison closed 2 months ago

jayaddison commented 1 year ago

Describe the bug

It looks like some HTML meta tags are being duplicated in the HTML builder's output, at least when using the Sphinx basic theme.

For example, in the jinja v3.1 documentation here, I see this when using 'view source' on the page:

<!DOCTYPE html>

<html lang="en">
  <head>
    <meta charset="utf-8" />
    <meta name="viewport" content="width=device-width, initial-scale=1.0" />
  <meta name="viewport" content="width=device-width, initial-scale=1">
  ...

I think that the reason is that Sphinx combines meta tags from docutils and its' own. Here's where the docutils meta tags are filtered, to remove a few other items that would otherwise cause duplicates:

https://github.com/sphinx-doc/sphinx/blob/cf7d2759af0852d67288e58d823d51fe860749ca/sphinx/writers/html.py#L44

The [2:] list index filtering looked a bit odd -- and was written in Y2008 (quite a while ago) so I think it's something to do with the problem.

docutils added the meta viewport tag more recently, in Y2021 - so I think that the Sphinx code doesn't factor this into the meta tag filtering.

One approach could be to filter based on the properties of each tag rather than their position in the visited elements. I think that the three items that currently make sense to filter are:

How to Reproduce

This can be replicated using Sphinx's own documentation set as an input and running the HTML builder.

For example, by running:

$ sphinx-build -W -b html doc doc_output

Environment Information

Platform:              linux; (Linux-6.5.0-1-rt-amd64-x86_64-with-glibc2.37)
Python version:        3.11.5 (main, Aug 29 2023, 15:31:31) [GCC 13.2.0])
Python implementation: CPython
Sphinx version:        7.3.0+/b9c85984d
Docutils version:      0.20.1
Jinja2 version:        3.1.2
Pygments version:      2.16.1

Sphinx extensions

N/A

Additional context

N/A

jayaddison commented 1 year ago

(I'll try to attempt a fix for this within the next few days)

AA-Turner commented 1 year ago

Perhaps we could adulterate the generator tag with Sphinx? (Or replace entirely)

I.e.

<meta name="generator" content="Docutils 0.20.0: https://docutils.sourceforge.io/; Sphinx 7.2.6: https://www.sphinx-doc.org/" />

or

<meta name="generator" content="Sphinx 7.2.6" />

or

<meta name="generator" content="Docutils 0.20.0: https://docutils.sourceforge.io/" />
<meta name="generator" content="Sphinx 7.2.6" />

A

jayaddison commented 1 year ago

I think the short-term approach should be to de-noise the existing behaviour - as in, remove the duplicate meta tags.

Longer-term, I sense there's a longstanding debate between openness (provide complete version/generation information) and, depending how it's phrased, benefits by avoiding tying content to a specific platform, and/or a security-break latency increase by requiring that some attackers have to discover version information through other means.

Are Sphinx HTML documentation deployments usually/always static content?

jayaddison commented 1 year ago

Are Sphinx HTML documentation deployments usually/always static content?

Something I need to learn to do better, especially since this is a documentation project: attempt to read the documentation to answer questions like this.

Yes, the HTML builder's output is static HTML and this is the intended deployment model. This is detailed in one of the Sphinx documentation appendices.

That decouples the content deployment from any risk to the server hosting the documentation. If I were being extremely paranoid I'd say that if there were some vulnerable code running on the client (in a JavaScript context) then perhaps the ability to read some information about the generator versions could be useful to an attacker, but it seems like a fairly obscure and unlikely source of information.

So I guess it could make sense to include the generator versions for both docutils and Sphinx on the page. Perhaps that could help to discover adoption rates for versions of Sphinx, something that could be useful for maintenance and planning purposes. Even so, I'd probably start by removing the duplicated meta tag and then opening a separate issue about expanding the generator metadata on the page, to ensure anyone curious in future can trace back the change to that and any relevant discussion.

jayaddison commented 1 year ago

As if this issue wasn't niche/small enough already: it turns out that the alabaster theme -- used during some of the HTML unit tests -- also adds its own viewport meta tag, meaning that there are currently three on some pages (one from the inherited basic theme, one from docutils, and then one from alabaster).

I've prepared some code locally to filter the docutils tags; consolidating the tags between the themes could take a bit more investigation.

I haven't found any W3C/HTML specs to indicate that viewport is strictly a singleton tag; last-wins is probably the algorithm that most browsers follow when they see multiple, I'd guess..

jayaddison commented 1 year ago

Re: the generator meta tag question: I'm leaning towards suggesting that yes, including a combined Sphinx + docutils tag on each page would make sense.

In general: identifying software affected by specific bugs does have some risks and downsides, but I think that the ability to find and fix affected items -- and also the reverse, to identify an affected version easily when a possible bug is spotted - should outweigh that.

jayaddison commented 2 months ago

In theory this would be nice to fix; however, I'm not aware of it causing any problems at the moment, and so it seems worth closing. If it does turn out to be relevant again in future, please feel free to re-open (or to file a separate issue).