n-t-roff / heirloom-doctools

The Heirloom Documentation Tools: troff, nroff, and related utilities
http://n-t-roff.github.io/heirloom/doctools.html
Other
127 stars 23 forks source link

Boxed within floating keep sends troff in endless loop (no output) #79

Open ljrk0 opened 5 years ago

ljrk0 commented 5 years ago

Minimum example (with -ms -Tpost):

.KF
.B1
FOO
.B2
.KE

Interestingly reversing the order does not produce the problem (but probably results in different results):

.B1
.KF
FOO
.KE
.B2

This does not occur on the default troff of Illumos 2018.10 nor on Oracle Solaris 10 1/13, but on latest master (tested on Arch).

reffort commented 5 years ago

I can confirm what you're seeing. It appears to be due to double backslashes \\ being converted into \e when in a diversion, which prevents the use of registers in a diversion (it also prevents the use of output line traps).

The traditional behavior was changed in commit 3f47f6... "Preserve \\ through diversions (like groff, unlike traditional roff)" dated 25 Jun 2015. Despite the description, the effect of this change is to not preserve \\ in a diversion--it converts \\ to \e--also, it does not quite work like groff. If this commit is reverted, your example works as expected.

ljrk0 commented 5 years ago

@reffort Thanks for digging into this, I had no idea where to start. This definitely seems the point to fix it. I have not so much looked into the source, but it sounds like this could be not-so-wrong (from my phone, beware):

case ESC:   /* double backslash */
                if (dilev)
                        i = ESC;
        else if (prdblesc)
            i = PRESC;
        else
            i = eschar;
        goto gx;

I'm absolutely not sure whether we should just perhaps fallthrough instead as I do not know what gx does etc., though.

reffort commented 5 years ago

What I use is the original troff code:

case ESC:   /* double backslash */
    i = eschar;
    goto gx;

but with a switch so I can use the other behavior if need be:

case ESC:   /* double backslash */
    if ((prdblesc || dilev) && !escesc)
        i = PRESC;
    else
        i = eschar;
    goto gx;

The variable escesc is controlled by a request .ee (escape means escape).

The document will then need to make a distinction between an escape sequence and a printable escape character \e, but that's the way troff and nroff have always been. You can get away with using \\ to get a printable backslash in the top level only.

I think the behavior was probably changed to accommodate man pages that use the non-portable groff-specific convention, but I really don't know for sure.

ljrk0 commented 5 years ago

Hm, but isn't the current behavior wrong anyway? It is neither the 'old' behavior nor the groff behavior, as you wrote. On groff the code works as well as on 'old' troff.

n-t-roff commented 5 years ago

Reverting 3f47f6fc004822ac3edb1c34e7b1ae9b462bf8dd, @LeonardKoenig suggestion, and the original troff code cause wrong output of many manpages. How about something like

case ESC:
    if (prdblesc || (dilev && escesc))
        i = PRESC;
    else
        i = eschr;

with escesc == 0 by default? That would be equal to heirloom's traditional behavior and I could set .ee in the manpage macros.

n-t-roff commented 5 years ago

It would fix the manpage issue while keeping compatibility to use:

case ESC:
    if (prdblesc || (dilev && gemu))
        i = PRESC;
    else
        i = eschr;
    goto gx;
ljrk0 commented 5 years ago

Hm, but as far as I understand, heirloom's traditional behavior is not groff's behavior, which works for both, man-pages and this example code. Instead of supporting heirloom+old behavior, why not implement groff+old behavior -- or am I missing something?

reffort commented 5 years ago

When I looked into this a while back, I got as far as realizing that groff handles escapes a different way than troff and I would have to dig into the groff code to find out how it worked, so I just took the easy way out with .ee. If the reverse meaning is adopted, perhaps it could be given a different name, because it would then mean "escape isn't necessarily escape" (maybe .eg or something).

The double backslashes do work in groff with macros in nested diversions, and groff even correctly handles the on-the-fly macro code for output line traps defined several diversion levels deep (groff does not have output line traps, of course, but the macros work).

Although groff's behavior is non-standard, I think it would be great if the double backslash could be made to work in diversions at multiple levels and with macros, because it would solve a consistency problem with C and sed about what \\ does. The flip side is that it makes document coding ambiguous to the point that writers tend to add extra backslashes everywhere one is used; this can be seen in many groff documents and man pages, and some of them are really sloppy. With the traditional behavior, the meaning is unambiguous except when the writer exploits the quirk in the top level. Gunnar fortunately deleted the sentence condoning this usage that has been in the User's Manual since at least the late 1970s.

As for gemu, if that is what takes effect with the .cp request, the implication would be that it should work in groff mode, but it doesn't appear to solve the problem in Leonard's example, which is something that would probably occur in groff documents.

What is the effect of prdblesc? The description indicates it enables the use of \\ to mean \e in fields, but it doesn't elaborate on that (the line was changed earlier than dilev). If it was done for the same reason as dilev, it seems to me it would be more consistent to have either the traditional troff behavior or the modified behavior in effect for both prdblesc and dilev. That's the reason I switched both of them with escesc.

n-t-roff commented 5 years ago

Because of the endless loop issue it seems to be better to have the traditional behavior as the default by replacing .ee with something like .eg (combine prdblesc and dilev with something that is 0 by default).

reffort commented 5 years ago

With .eg, would prdblesc still be necessary?

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, November 2, 2018 3:10 PM, n-t-roff notifications@github.com wrote:

Because of the endless loop issue it seems to be better to have the traditional behavior as the default by replacing .ee with something like .eg (combine prdblesc and dilev with something that is 0 by default).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

n-t-roff commented 5 years ago

Without analyzing it--it is necessary for manpages, unfortunately. I agree that the actual issue are sloppy written manpages (and manpage generators like pod2man), but I can't change them and they work with groff.

reffort commented 5 years ago

I understand where you're coming from about the manpage problems.

I realized right after I clicked the "Send" button that "prdblesc" (for fields) needs to work both at the top level and in a diversion, so it would be necessary in that case.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Saturday, November 3, 2018 7:40 AM, n-t-roff notifications@github.com wrote:

Without analyzing it--it is necessary for manpages, unfortunately. I agree that the actual issue are sloppy written manpages (and manpage generators like pod2man), but I can't change them and they work with groff.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.