Closed hassanakbar4 closed 3 years ago
@{"email"=>"cabo@tzi.org", "name"=>nil, "username"=>nil} commented
There we are ("arbitrary processing"):
lines = [line.rstrip() for line in text.expandtabs(4).split('\n')]
Line 525 in trunk/cli/xml2rfc/writers/raw_txt.py
(N.B.: Changing the »4« to any other arbitrary number does not make the situation any better.)
@{"email"=>"tony@att.com", "name"=>nil, "username"=>nil} commented
version 1 expanded tabs to 8 character boundaries:
{{ append new [string repeat " " [expr 8 - ($x % 8)]] }}
about line 14397 of xml2rfc.tcl.
If any change should be made, it should be to match the v1 behavior.
@{"email"=>"julian.reschke@gmx.de", "name"=>nil, "username"=>nil} commented
This is also a vocabulary issue.
We either need to document how TABs are expanded, or disallow them.
@{"email"=>"cabo@tzi.org", "name"=>nil, "username"=>nil} commented
Just documenting one of the three potentially sensible behaviors (tabstops at 8 like in UNIX, tabstops at 4 like in Windows, tabstops at 2 as a natural tab width for RFCs) is an accident continuously waiting to happen. This was just nicely demonstrated by RFC7386. The problem is compounded by the fact that many of our tools ignore whitespace differences, e.g., rfcdiff: https://tools.ietf.org/rfcdiff?url2=rfc7386 looks as if everything is alright!
Failing hard immediately is the only way to reliably avoid this kind of problem. So it's a MUST NOT on all elements with xml:space="preserve". (HTs can be tolerated in input to be formatted, or they can be completely outlawed.)
There also is a backwards compatibility issue: xml2rfc probably needs a special command line flag to accept old XML files that have tabs in them (apparently continuing to interpret them with tabstops at 4 for v2-compat or 8 for v1-compat).
@{"email"=>"julian.reschke@gmx.de", "name"=>nil, "username"=>nil} commented
(a) I agree that a hard failure would be the right thing.
(b) xml2rfcv1 at least warns:
xml2rfc: warning: pre-formatted output line 4 (on page 2) contained tab characte
rs which were expanded around input line 32
@{"email"=>"tony@att.com", "name"=>nil, "username"=>nil} commented
This change in behavior in v2 is a bug.
I think the proper fix is to make v2 work like v1:
) expand at boundaries of 8 ) print a WARNING message that expansion occurred
The only place that TABs are problematic is in an artwork/code block. They must not be disallowed elsewhere: too many XML editors output them automatically.
An alternative change is:
) print an ERROR when TABs occur within artwork/code blocks. ) accept TABs elsewhere
I would find this an acceptable change.
@{"email"=>"julian.reschke@gmx.de", "name"=>nil, "username"=>nil} commented
My preference would be to simply document that HTABs might or might not be expanded and thus ought to be avoided (that's what I'll put into the v2 spec).
HTABs elsewhere are harmless, as they fall into the general rules of whitespace behavior (for instance, CRs and additional SPs are treated like a single SP).
@{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} changed status from new
to closed
@{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} changed resolution from ` to
fixed`
@{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} commented
Fixed in [1768]:
Issue a warning for input containing tab characters, and expand to 8, not 4 characters. Fixes issue #276.
component_Version 2 cli
resolution_fixed
type_defect
| by cabo@tzi.orgXML2RFC appears to accept TAB (ASCII HT) characters in layout-sensitive input such as and then appears to apply arbitrary processing to them.
TAB characters MUST be rejected in this context as they do not have a defined meaning.
Context: [Editorial Errata Reported] RFC7386 (4132)
Issue migrated from trac:276 at 2021-10-20 18:17:37 +0500