n-t-roff / heirloom-doctools

The Heirloom Documentation Tools: troff, nroff, and related utilities
http://n-t-roff.github.io/heirloom/doctools.html
Other
126 stars 23 forks source link

A user-defined list of sentence-ending characters, transparent characters, etc., can contain unwanted characters. #70

Closed reffort closed 6 years ago

reffort commented 6 years ago

Description: When a user-defined list of characters is defined and then redefined with a new list that contains fewer characters than the first one, extra characters from the previous list are still present.

Cause: When the list is defined, the end of the list is not marked, causing extra characters from the old list to be reused.

Affects: .sentchar, .transchar,. breakchar, .nhychar, .connectchar

To replicate:

.do xflag 3
.
.de test
.   lds reqname \\$1
.   shift
.
.       \" print the default list
.   tm \\*[reqname] Default : \\n[.\\*[reqname]]
.
.       \" ... a long user-defined list ...
.   \\*[reqname] \\$*
.   tm 1: \\n[.\\*[reqname]]
.
.       \" ... and a short list
.   \\*[reqname] AB
.   tm 2: \\n[.\\*[reqname]]
.   tm
..
.
.test sentchar 123456789012345678901234567890123456789012345
.test transchar 9 8 7 6 5 4 "3   2     1      0"
.test breakchar a b c d e f g h i j
.test nhychar jihgfedcba
.test connectchar etaoinshrdlu

Execute the script with: troff <testscript> > /dev/null

which results in :

sentchar Default : .?!:
1: 1234567890123456789012345678901234567890
2: AB34567890123456789012345678901234567890

transchar Default : �"')]*
1: 9876543210
2: AB76543210

breakchar Default : �-
1: abcdefghij
2: ABcdefghij

nhychar Default : �-
1: jihgfedcba
2: ABhgfedcba

connectchar Default : "\(ru\(ul\(rn
1: etaoinshrdlu
2: ABaoinshrdlu

(The double quotation mark is automatically inserted into the default list of connectchars in n4.c when the read-only number register's default list is output, but not when a user-defined list is output.)

The proposed fix is to modify function propchar() in n5.c so it marks the end of a user-defined list with a zero:

n5.c:2275:
static void
propchar(size_t *tp)
{
    size_t  c, *tpp;
    tchar   i;

    if (skip(0)) {
        *tp = IMP;
        return;
    }
    tpp = tp;
    do {
        while (!ismot(c = cbits(i = getch())) &&
                c != ' ' && c != '\n')
-           if (tpp < &tp[NSENT])
+           if (tpp < &tp[NSENT-1])
                *tpp++ = c;
    } while (!skip(0));
+   *tpp = 0 ;
}

After changing those two lines, the test script's output should be:

sentchar Default : .?!:
1: 123456789012345678901234567890123456789
2: AB
. . .

etc.

The maximum number of characters for these requests will now be 39 instead of 40. In the read-only register code in n4.c, some local variables are defined to be of length NSENT+1; the "+1" will not be necessary after the patch, but it won't really hurt anything, either. (Lines 383, 391, 408, 425, 444).

While on the subject of propchar():

  1. At some point, the type of propchar's argument and the local variables c and *tpp, as well related global variables and local variables for other functions, have been changed from int to size_t. I'd like to revisit that decision. It looks to me that if the type needed to be changed, it should have become a tchar, not a size_t. (Note the mangled usage in the propchar() code above. Note also that tchar is signed and size_t is unsigned, although it works here for the same reason an int does; a size_t will probably also fail for the same reason an int will.)

  2. Undocumented behavior: If the user has defined a custom character list, the original (default) list can be restored by passing one of the motion characters \d, \r, \u, \|, or \^ (e.g., .sentchar \r), but not \v or \h. This is not at all intuitive, so it would probably be worth while to document it in the troff user's manual.

n-t-roff commented 6 years ago

Thank you for fixing this! I'll update the documentation according your proposal soon.

reffort commented 6 years ago

Something I didn't recognize is that the use of all NSENT (40) characters can be retained by not writing a zero to the very last slot. In every existing use of these lists, the number of characters scanned is limited by NSENT, so there should be no risk of overrunning. The purpose of the zero would be to flag the end of a list that is shorter than NSENT.

This version would revert the NSENT-1 line in propchar() and add a test, resulting in:

    do {
        while (!ismot(c = cbits(i = getch())) &&
                c != ' ' && c != '\n')
            if (tpp < &tp[NSENT])
                *tpp++ = c;
    } while (!skip(0));
+   if (tpp < &tp[NSENT])
        *tpp = 0 ;
}
reffort commented 6 years ago

Thanks!