ietf-tools / xml2rfc

Generate RFCs and IETF drafts from document source in XML according to the IETF xml2rfc v2 and v3 vocabularies
https://ietf-tools.github.io/xml2rfc/
BSD 3-Clause "New" or "Revised" License
69 stars 38 forks source link

XML2RFC handling of hyphens adds a space #929

Open AdrianFarrel opened 1 year ago

AdrianFarrel commented 1 year ago

Describe the issue

If my source XML has a hyphen and I choose to break the line at that point, the final text adds a single white space as it would between words.

E.g.

This is a test. Break a hyphenated- word across a line boundary.

Generates...

This is a test. Break a hyphenated- word across a line boundary.

I noticed this with draft-ietf-teas-rfc3272bis-21 in the datatracker repositoriy.

I believe that the old xml2rfc had a fix for this. Obviously, you do want a space added in normal cases, but not when the last character on a line is a hyphen.

Code of Conduct

cabo commented 1 year ago

Doing this to non-SHY hyphens is a totally new concept to me.

See RFC 9006, right before Section 5, for a place where this would have helped (the .TXT happens to have the line break in the right place so this wasn't noticed, the .PDF doesn't). Similar, page 6 (PDF) of RFC 9300, page 3 of RFC 9276, page 20 of RFC 9244 -- I stopped looking after these four.

(Note that minus-hyphens are used in dash emulations such as -- and ---, and even single hyphens as dashes in e.g., RFC 9260, so the actual rule would probably need to be more complicated.)

Artwork of course wouldn't do this.

To me this seems more like a linter item -- something that a tool should have caught before the RFCs were published.

cabo commented 1 year ago

RFC 9231, RFC 9315, RFC 9291, RFC 9286, RFC 9235, RFC 9204, RFC 9203, RFC 9194, RFC 9171, ... and probably three times as many down to RFC 8650.

I didn't know that this is such a widespread error.

jrlevine commented 1 year ago

In RFC9315 and RFC9194 the space is correct. I don't see how the tool is supposed to guess what the author meant. Seems like a thing for linting, not doing mechanically.

I admit I don't have a good suggestion for how to fix the ones that are wrong, other than to hand-fix them if and when we redo the XML to fix <postal> and so forth.

AdrianFarrel commented 1 year ago

There is a rare case where a hyphenated option is in use. For example, “His face was color- red or blue.” This would be confusing if the line break came after the hyphen. But that formation is considered bad style because of confusion when there are more than two elements in the list (are they all supposed to be hyphenated?) so correct style is to write, “His face was color-red or color-blue”. Of course, German likes to do this, and so it finds its way into American, but Fowler would frown.

If they wanted a space (usually an emdash) they would have “word - word”.

So (IMHO) space-hyphen-EOL always maps to space-hyphen-space, but word-hyphen-EOL maps to word-hyphen-continuationWord

You might consult the RFC Editor to check house style.

A

From: John L @.> Sent: 01 November 2022 18:17 To: ietf-tools/xml2rfc @.> Cc: AdrianFarrel @.>; Author @.> Subject: Re: [ietf-tools/xml2rfc] XML2RFC handling of hyphens adds a space (Issue #929)

In RFC9315 and RFC9194 the space is correct. I don't see how the tool is supposed to guess what the author meant. Seems like a thing for linting, not doing mechanically.

I admit I don't have a good suggestion for how to fix the ones that are wrong, other than to hand-fix them if and when we redo the XML to fix and so forth.

— Reply to this email directly, view it on GitHub https://github.com/ietf-tools/xml2rfc/issues/929#issuecomment-1298926297 , or unsubscribe https://github.com/notifications/unsubscribe-auth/ADBLGMY2RNUXFQEMI3JQ5Z3WGFNCTANCNFSM6AAAAAARUISUYY . You are receiving this because you authored the thread.Message ID: @.***>

jrlevine commented 1 year ago

In 9315 the text is:

However, the zero- or one-touch approach

with the line break before "or" so the space is correct. As I said, this is not something software can reliably guess.