ietf-tools / xml2rfc

Generate RFCs and IETF drafts from document source in XML according to the IETF xml2rfc v2 and v3 vocabularies
https://ietf-tools.github.io/xml2rfc/
BSD 3-Clause "New" or "Revised" License
69 stars 38 forks source link

Extra space in "identifiers" block HTML #875

Open martinthomson opened 2 years ago

martinthomson commented 2 years ago

Describe the issue

The HTML rendering of the identifiers block (<dl class="identifiers">) includes a number of plain textual items, plus a few items that use nested elements. Some of the generated <dd> elements include additional whitespace before an initial, inline child element, which is hard (or maybe impossible) to remove with styling. This leads to misalignment in rendering.

Items that include this extra whitespace are:

Can this extra space be removed?

Code of Conduct

martinthomson commented 2 years ago

I did some digging on this and it seems like this is going to be HARD. The lxml library manages HTML serialization and when you enable the pretty_print option (as xml2rfc does, and should do), something in the creation of the updates/obsoletes element causes lxml to serialize the content of the <dd> element on the next line:

<dd class="updates">
<a href="https://www.rfc-editor.org/rfc/rfc2119" class="eref">2119</a> (if approved)</dd>

I couldn't work out how to suppress this. It seems to be caused by there being text content in the element. A single <a> element in updates/obsoletes will render properly once you remove the line that sets a.tail = ' ', but as soon as you have two or it is a draft (where the tail is set to " (if approved)"), you have text content and lxml serializes on a new line as shown.

I did manage to suppress the leading space on the "published" element by removing the tail on the <time> element. This turns out to be added if the original <date> element from which it was created also included trailing text, which is usually just a newline. That's counter-intuitive, but a consequence of how the conversion works, so that can be tweaked:

                # Publication date
                date = x.find('date')
                date.tail = None
                pubdate = self.render_date(None, date)
                entry(dl, 'Published', pubdate)
cabo commented 1 year ago

I now see id="identifiers", which clashes with document IDs:

https://github.com/ietf-wg-jsonpath/draft-ietf-jsonpath-base/issues/291

martinthomson commented 1 year ago

That id="identifiers" thing seems pretty serious and might be worth a different issue.

(On this issue, I've a workaround for this in styling. It is an abomination, but it does work well enough, assuming that you have CSS grid and flexbox and a few other things that shouldn't be necessary but end up being essential.)