ietf-tools / xml2rfc

Generate RFCs and IETF drafts from document source in XML according to the IETF xml2rfc v2 and v3 vocabularies
https://ietf-tools.github.io/xml2rfc/
BSD 3-Clause "New" or "Revised" License
64 stars 38 forks source link

Allow unicode in all elements #960

Closed kesara closed 1 year ago

kesara commented 1 year ago

Please add, both here and below: dd, dt, li, blockquote, and any other block-level elements I missed. Then add, both here and below, the "inline" elements: cref (?), em, eref (?), iref (?), relref, strong, sub, sup, tt, and xref. (I didn't think much about the cross reference stuff, but I think they can contain text.)

I have immediate use for many of these. I can wrap in <t> for some of the block-level elements, but not the inline-level ones.

_Originally posted by @martinthomson in https://github.com/ietf-tools/xml2rfc/pull/895#discussion_r1053856348_

cabo commented 1 year ago

(Not just block elements.)

Note that the need to contort the syntax (insert <t>) to work around what is essentially an xml2rfc bug is unacceptable.

(But it is good that we are at the halfway house with this :-)

rjsparks commented 1 year ago

Lets describe the scope better. Do we need an explicit enumeration of elements to review, or are there groups that describe the set well enough?

cabo commented 1 year ago

[...] are there groups that describe the set well enough?

Yes: all elements.

There is no element in the RFCXMLv3 grammar that has the requirement to use <u.

jrlevine commented 1 year ago

@rjsparks Funny you should mention that. One of the unfinished tasks in 7991bis (or whatever we call it) is to clean up the set of elements so that elements with similar semantics allow the same kinds of contents.

alicerusso commented 1 year ago

Here's an example of a document in queue where it would be useful to not have the current restriction. See Section 4.1.1 of draft-ietf-tcpm-rfc8312bis

If we put the list into <dl> -- In xml2rfc currently, the non-ASCII chars are allowed to be used inside <t> (without <u>). However, the same is not true for <dt> -- and <dt> cannot contain <t>.
So, if you put the desired char (β) in <dt>, xml2rfc outputs &#946; without warning you that it has given you bad output. (For background <contact> was used as a workaround in the original XML.)

Considering ways forward:

cabo commented 1 year ago

There is no reason to confine the fix to just <t elements. This initial step was just what we did do achieve some forward progress. Instead, the artificial restrictions invented by xml2rfc need to removed altogether.

rjsparks commented 1 year ago

Who's taking the pen to push the grammar regularization through rswg?

jrlevine commented 1 year ago

@rjsparks I should but it's not going to happen until at least late June

cabo commented 1 year ago

Who's taking the pen to push the grammar regularization through rswg?

I'm not sure why this question is under this issue, as removing the misguided character set restrictions does not require any changes in the XML grammar.