ietf-tools / xml2rfc

Generate RFCs and IETF drafts from document source in XML according to the IETF xml2rfc v2 and v3 vocabularies
https://ietf-tools.github.io/xml2rfc/
BSD 3-Clause "New" or "Revised" License
63 stars 35 forks source link

Artifacts from double line-breaking #1120

Open cabo opened 3 months ago

cabo commented 3 months ago

Describe the issue

In some cases, the renderer line-breaks some text to an imaginary width; the text renderer then again line-breaks the result of that to the actual width, while the HTML renderer generates HTML with unnecessarily pre-broken lines (and line-broken to the incorrect imaginary width) . In the text rendering this leads to introducing spaces at break points in the middle of the line.
This can be seen in www.open- std.org (note the spurious space) in the [C] reference in 0txt.

The HTML has an unnecessarily early line-break 0html. (A likely related problem can be seen with the datatracker-specific rendering, e.g. at 1.)

The PDF rendering obtained from the author tools looks right, though; this could be a coincidence.

Code of Conduct

kesara commented 3 months ago

@cabo I think the issue with spaces in breakpoints of text output has been fixed since v3.20.0. EDIT: The following is from xml2rfc 3.20.1:

   [C]        International Organization for Standardization,
              "Information technology — Programming languages — C",
              Fourth Edition, ISO/IEC 9899:2018, June 2018,
              <https://www.iso.org/standard/74528.html>.  Technically
              equivalent specification text is available at
              https://web.archive.org/web/20181230041359if_/
              http://www.open-std.org/jtc1/sc22/wg14/www/abq/
              c17_updated_proposed_fdis.pdf
              (https://web.archive.org/web/20181230041359if_/
              http://www.open-std.org/jtc1/sc22/wg14/www/abq/
              c17_updated_proposed_fdis.pdf)
kesara commented 3 months ago

Can you explain the unnecessary early line breaks in HTML? Because I'm not seeing any. Maybe the whole URL could have moved to the next line.

Screenshot 2024-04-01 at 12 01 09
cabo commented 3 months ago

@cabo I think the issue with spaces in breakpoints of text output has been fixed since v3.20.0.

   [C]        International Organization for Standardization,
              "Information technology — Programming languages — C",
              Fourth Edition, ISO/IEC 9899:2018, June 2018,
              <https://www.iso.org/standard/74528.html>.  Technically
              equivalent specification text is available at
              https://web.archive.org/web/20181230041359if_/
              http://www.open-std.org/jtc1/sc22/wg14/www/abq/
              c17_updated_proposed_fdis.pdf
              (https://web.archive.org/web/20181230041359if_/
              http://www.open-std.org/jtc1/sc22/wg14/www/abq/
              c17_updated_proposed_fdis.pdf)

But that's what I'm seeing now:

Screenshot 2024-04-01 at 06 04 09
cabo commented 3 months ago

Can you explain the unnecessary early line breaks in HTML? Because I'm not seeing any. Maybe the whole URL could have moved to the next line.

Screenshot 2024-04-01 at 12 01 09

And this is a screenshot from the HTML:

Screenshot 2024-04-01 at 06 06 50

You can clearly see the early line break after open-

cabo commented 3 months ago

The -04 renderings linked above apparently were made with 3.20.1, while the -03 ones (which also exhibit these weirdnesses) were made with 3.20.0. Oh, and the U+2028 that I introduced in -04 to work about the missing <br in the <annotation content model appears to be ignored in .TXT but heeded in .HTML.

(Looking at the HTML with Arc Version 1.36.0 (48035), which uses Chromium Engine Version 123.0.6312.87.)

cabo commented 3 months ago

This is how https://www.ietf.org/archive/id/draft-ietf-cbor-cddl-more-control-04.html#C looks like in Safari Version 17.4.1 (19618.1.15.11.14):

image

Looks similar. Maybe this is indeed an artifact of browser line breaking preferring line-breaking after the hyphen; let's focus on the .TXT weirdness then.

cabo commented 3 months ago

(This is what I get in Safari with a narrow window. Weird.)

image