python / cpython

The Python programming language
https://www.python.org
Other
62.58k stars 30.03k forks source link

IDLE: Revise html to tkinker converter for help.html #81479

Closed terryjreedy closed 8 months ago

terryjreedy commented 5 years ago
BPO 37298
Nosy @terryjreedy, @roseman, @JulienPalard, @csabella

Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.

Show more details

GitHub fields: ```python assignee = 'https://github.com/terryjreedy' closed_at = None created_at = labels = ['3.8', 'expert-IDLE', 'type-bug', '3.7', '3.9'] title = 'IDLE: Revise html to tkinker converter for help.html' updated_at = user = 'https://github.com/terryjreedy' ``` bugs.python.org fields: ```python activity = actor = 'terry.reedy' assignee = 'terry.reedy' closed = False closed_date = None closer = None components = ['IDLE'] creation = creator = 'terry.reedy' dependencies = [] files = [] hgrepos = [] issue_num = 37298 keywords = [] message_count = 4.0 messages = ['345722', '346205', '346206', '346241'] nosy_count = 4.0 nosy_names = ['terry.reedy', 'markroseman', 'mdk', 'cheryl.sabella'] pr_nums = [] priority = 'normal' resolution = None stage = 'needs patch' status = 'open' superseder = None type = 'behavior' url = 'https://bugs.python.org/issue37298' versions = ['Python 3.7', 'Python 3.8', 'Python 3.9'] ```

Linked PRs

terryjreedy commented 5 years ago

Sphinx 2.? generates different html than 1.8 such that the display of
Help ==> IDLE Help has extra blank lines. Among possibly other things, the contents of \<li>...\</li> is wrapped in \<p>...\</p> and blank lines appear between the bullet and text.

\<ul class="simple"> -\<li>coded in 100% pure Python, using the \<a class="reference internal" href="tkinter.html#module-tkinter" title="tkinter: Interface to Tcl/Tk for graphical user interfaces">\<code class="xref py py-mod docutils literal notranslate">\<span class="pre">tkinter\</span>\</code>\</a> GUI toolkit\</li> -\<li>cross-platform: works mostly the same on Windows, Unix, and macOS\</li> ... +\<li>\<p>coded in 100% pure Python, using the \<a class="reference internal" href="tkinter.html#module-tkinter" title="tkinter: Interface to Tcl/Tk for graphical user interfaces">\<code class="xref py py-mod docutils literal notranslate">\<span class="pre">tkinter\</span>\</code>\</a> GUI toolkit\</p>\</li> +\<li>\<p>cross-platform: works mostly the same on Windows, Unix, and macOS\</p>\</li> ... \</ul>

A similar issue afflicts the menu, with blank lines between the menu item and the explanation.

The html original 3x/Doc/build/html/library/idle.html#index-0 looks normal in Firefox. The html parser class in help.py needs to ignore \<p> within \<li>. It should specify which version of Sphinx it is compatible with.

Do any of you have any idea what the html change might be about? Is there something wrong with idle.rst?

csabella commented 5 years ago

tl;dr I think it's a difference in the CSS for the HTML5 writer.

----------------------------------------

In the HTMLTranslator class for docutils writer [1], I found the following docstring, specifically the line "The html5_polyglot writer solves this using CSS2.".

"""
The html4css1 writer has been optimized to produce visually compact
lists (less vertical whitespace).  HTML's mixed content models
allow list items to contain "<li><p>body elements</p></li>" or
"<li>just text</li>" or even "<li>text<p>and body
elements</p>combined</li>", each with different effects.  It would
be best to stick with strict body elements in list items, but they
affect vertical spacing in older browsers (although they really
shouldn't).
The html5_polyglot writer solves this using CSS2.

Here is an outline of the optimization:

- Check for and omit <p> tags in "simple" lists: list items
  contain either a single paragraph, a nested simple list, or a
  paragraph followed by a nested simple list.  This means that
  this list can be compact:

      - Item 1.
      - Item 2.

  But this list cannot be compact:

      - Item 1.

        This second paragraph forces space between list items.

      - Item 2.

- In non-list contexts, omit <p> tags on a paragraph if that
  paragraph is the only child of its parent (footnotes & citations
  are allowed a label first).

- Regardless of the above, in definitions, table cells, field bodies,
  option descriptions, and list items, mark the first child with
  'class="first"' and the last child with 'class="last"'.  The stylesheet
  sets the margins (top & bottom respectively) to 0 for these elements.

The ``no_compact_lists`` setting (``--no-compact-lists`` command-line
option) disables list whitespace optimization.
"""

In the HTMLTranslator class for the base [2], I found this comment: # Do not omit \<p> tags # -------------------- #

The HTML4CSS1 writer does this to "produce

# visually compact lists (less vertical whitespace)". This writer
# relies on CSS rules for"visual compactness".
#
# * In XHTML 1.1, e.g. a <blockquote> element may not contain
#   character data, so you cannot drop the <p> tags.
# * Keeping simple paragraphs in the field_body enables a CSS
#   rule to start the field-body on a new line if the label is too long
# * it makes the code simpler.

Since both comments are a few years old, I think it's in the CSS.

[1] https://sourceforge.net/p/docutils/code/HEAD/tree/trunk/docutils/docutils/writers/html4css1/__init__.py [2] https://sourceforge.net/p/docutils/code/HEAD/tree/trunk/docutils/docutils/writers/_html_base.py

csabella commented 5 years ago

Adding on to my last post, it's not in the CSS, but it's that Sphinx 2.0 switches from a default of HTML4 to HTML5. The docutils comments explain the difference between the two.

https://github.com/sphinx-doc/sphinx/commit/a3cdd465ecf018fa5213b6b2c1c4e495973a2896

terryjreedy commented 5 years ago

Thank you for the research, including the crucial commit! What I understand from the quotes:

  1. Sphinx 2 writes HTML5 by default. The html5 writers always writes paragraphs because they are required by the xhtml used by html5.

  2. Firefox, for instance, displays the result the same as before either because it either has the logic to avoid extra blank lines when reading html5 or because this is taken care of by revised css (this is unclear from the quotes).

To deal with html5, our converter would have to ignore the \<p>s that the html4 writer omitted, by adding logic for the cases used in idle.rst. Not fun.

Reading the commit (3rd line) revealed a new sphinx configuration option: html4_writer, defaulting to False. When I switched from building html with my 3.6 install with sphinx 1.8.1 to 3.7 with 2.something, and added "-D html4_writer=1" to a direct call of sphinx-build, I indeed got html without added \<p>s. The only different was the irrelevant omission of '\n' between list item header and text in the html file. Example: -\<dt>New File\</dt> -\<dd>Create a new file editing window.\</dd> +\<dt>New File\</dt>\<dd>Create a new file editing window.\</dd>

Setting SPHINXOPTS should work when using 'Doc/make.bat html'. I will prepare a PR documenting our parser requirement and include the neutral html changes.

terryjreedy commented 8 months ago

The blank lines between list bullets and text and between menu items and explanations was fixed on a PR not linked to this issue.

The single-spaced list at the top of the file begins with <ul class="simple">. The double-spaced main list in Key bindings begins with just <ul>, where as the nested list again includes class = "simple". The same difference appears at the end of Shell window Our formatter double spaces the non-simple list. I think the consistently single-spaced Firefox list format is better.

The PR stop double spacing non-simple lists when displayed by Help => IDLE Doc and will close this issue. (Edge browser also double spaces such lists but will not be affected. A separate PR will revise the lists to make clearer and make all simple.