Open martinthomson opened 2 years ago
Personal opinion:
I think it is clear enough in RFC 7991 that the updates attribute can only contain number, commas and draft names, not whitespace or anything else.
My preferred way forward would be to have ave xml2rfc strip everything that is not an RFC number, comma or I-D name. This would leave the formatting of that in the rendered RFCs to xml2rfc, which is more appropriate given that a) this is header/metadata not content and so the formatting is for the series editors to manager not the authors; and b) the tool can do a better job of formatting across all the different renderings.
It is unfortunate that we now have an RFC with "\u200b" in that attribute as it is likely to break or at least confuse any code that parses it. If we are going to republish the XML then "unexpected stuff in the XML that is contrary to our documentation and likely to break parsers" is probably high up the list of reasons to do so.
Describe the issue
RFC 8996 contains a LOT of RFC numbers in its updates attribute. Along with some of those, it includes a Unicode zero-width space character (U+200b).
While it is not clear that whitespace is allowed in this attribute, xml2rfc has been tolerant of whitespace thus far. The wrinkle here is that python's
strip()
does not, by default, recognize "\u200b" as whitespace. So what has happened is that the character has made its way into the links in HTML that xml2rfc generates. The resulting links are bad. (The HTML is also bad, but that is of less immediate consequence.)Options here appear to be:
My opinion is that the second course is better, but that requires broader discussion, probably in RSWG. Either way, I wanted to open this to track the issue.
Code of Conduct