hassanakbar4 / tractive-test

0 stars 0 forks source link

XML2RFC accepts TAB (ASCII HT) characters in layout-sensitive input #276

Closed hassanakbar4 closed 3 years ago

hassanakbar4 commented 10 years ago

component_Version 2 cli resolution_fixed type_defect | by cabo@tzi.org


XML2RFC appears to accept TAB (ASCII HT) characters in layout-sensitive input such as and then appears to apply arbitrary processing to them.

TAB characters MUST be rejected in this context as they do not have a defined meaning.

Context: [Editorial Errata Reported] RFC7386 (4132)


Issue migrated from trac:276 at 2021-10-20 18:17:37 +0500

hassanakbar4 commented 10 years ago

@{"email"=>"cabo@tzi.org", "name"=>nil, "username"=>nil} commented


There we are ("arbitrary processing"):

        lines = [line.rstrip() for line in text.expandtabs(4).split('\n')]

Line 525 in trunk/cli/xml2rfc/writers/raw_txt.py

(N.B.: Changing the »4« to any other arbitrary number does not make the situation any better.)

hassanakbar4 commented 10 years ago

@{"email"=>"tony@att.com", "name"=>nil, "username"=>nil} commented


version 1 expanded tabs to 8 character boundaries:

{{ append new [string repeat " " [expr 8 - ($x % 8)]] }}

about line 14397 of xml2rfc.tcl.

If any change should be made, it should be to match the v1 behavior.

hassanakbar4 commented 10 years ago

@{"email"=>"julian.reschke@gmx.de", "name"=>nil, "username"=>nil} commented


This is also a vocabulary issue.

We either need to document how TABs are expanded, or disallow them.

hassanakbar4 commented 10 years ago

@{"email"=>"cabo@tzi.org", "name"=>nil, "username"=>nil} commented


Just documenting one of the three potentially sensible behaviors (tabstops at 8 like in UNIX, tabstops at 4 like in Windows, tabstops at 2 as a natural tab width for RFCs) is an accident continuously waiting to happen. This was just nicely demonstrated by RFC7386. The problem is compounded by the fact that many of our tools ignore whitespace differences, e.g., rfcdiff: https://tools.ietf.org/rfcdiff?url2=rfc7386 looks as if everything is alright!

Failing hard immediately is the only way to reliably avoid this kind of problem. So it's a MUST NOT on all elements with xml:space="preserve". (HTs can be tolerated in input to be formatted, or they can be completely outlawed.)

There also is a backwards compatibility issue: xml2rfc probably needs a special command line flag to accept old XML files that have tabs in them (apparently continuing to interpret them with tabstops at 4 for v2-compat or 8 for v1-compat).

hassanakbar4 commented 10 years ago

@{"email"=>"julian.reschke@gmx.de", "name"=>nil, "username"=>nil} commented


(a) I agree that a hard failure would be the right thing.

(b) xml2rfcv1 at least warns:

xml2rfc: warning: pre-formatted output line 4 (on page 2) contained tab characte
rs which were expanded around input line 32
hassanakbar4 commented 10 years ago

@{"email"=>"tony@att.com", "name"=>nil, "username"=>nil} commented


This change in behavior in v2 is a bug.

I think the proper fix is to make v2 work like v1:

) expand at boundaries of 8 ) print a WARNING message that expansion occurred

The only place that TABs are problematic is in an artwork/code block. They must not be disallowed elsewhere: too many XML editors output them automatically.

An alternative change is:

) print an ERROR when TABs occur within artwork/code blocks. ) accept TABs elsewhere

I would find this an acceptable change.

hassanakbar4 commented 10 years ago

@{"email"=>"julian.reschke@gmx.de", "name"=>nil, "username"=>nil} commented


My preference would be to simply document that HTABs might or might not be expanded and thus ought to be avoided (that's what I'll put into the v2 spec).

HTABs elsewhere are harmless, as they fall into the general rules of whitespace behavior (for instance, CRs and additional SPs are treated like a single SP).

hassanakbar4 commented 9 years ago

@{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} changed status from new to closed

hassanakbar4 commented 9 years ago

@{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} changed resolution from ` tofixed`

hassanakbar4 commented 9 years ago

@{"email"=>"henrik@levkowetz.com", "name"=>nil, "username"=>nil} commented


Fixed in [1768]:

Issue a warning for input containing tab characters, and expand to 8, not 4 characters. Fixes issue #276.