n-t-roff / heirloom-doctools

The Heirloom Documentation Tools: troff, nroff, and related utilities
http://n-t-roff.github.io/heirloom/doctools.html
Other
127 stars 23 forks source link

Suggestion: adding external scripting support to troff #82

Open mircea3 opened 5 years ago

mircea3 commented 5 years ago

It might be useful to solve certain problems using external interpreted languages. My personal preference is Python, but every user is free to choose their own. I put together and tested one possible solution and attached the source code. Any request names, escape sequences, variable names, etc. should probably be modified by an expert to best fit the Heirloom troff project style.

The solution is similar to .pso request (execute subprocess and insert output as text), but with the following advantages:

1) The subprocess is always alive and can be called efficiently thousands of times in the same document. 2) New escape sequence can be used in macros and in strings. 3) The subprocess can maintain state, if needed, to solve certain tasks.

Usage

Start the scripting subprocess, only once. You may choose any command you prefer. Below is an example using Python to execute a short program: a loop which reads commands/expressions from standard input, executes/evaluates them and writes the result to standard output.

.scripting python trsrv.py

\K'str': evaluate str in script subprocess. Can be used inside macro or string:

.nr x 0 1
.de m
.nr x \K'0 if \\nx==3 else \\nx'
\\n+x
..
.ds s \R'x \K'0 if \\nx==3 else \\nx''\\n+x

.m
.m
.m
.m
.m
.m
--
\*s \*s \*s \*s \*s \*s

Result: 1 2 3 1 2 3 -- 1 2 3 1 2 3

Other examples

The features of the chosen scripting language are exposed to the document author. Some examples, using Python:

Example 1: store state inside the script subprocess to avoid re-typing the same expression:

.ds str \K'y = 1000*\\$1 + 10'\K'y+1', \K'y+2', \K'y+3'
\*[str 3]

Result: 3011, 3012, 3013

Example 2: unicode text processing: drop first and last 4 characters, then change to upper case:

.ds s The äußerst quick brown fox jumps over the lazy dog
\K'u"\*s"[4:-4].upper()'

Result: ÄUßERST QUICK BROWN FOX JUMPS OVER THE LAZY

Example 3: three random integers in range [1, 10]:

\K'import random'
\K'", ".join(str(random.randint(1, 10)) for _ in range(3))'

Result (varies): 9, 4, 10

Like anything else, this can be abused. The intended purpose of the scripting facility is not to replace the troff language, but only to address a few tight-spots which are overly complicated to express using troff.

htroff_add_scripting.patch.txt

trsrv.py.txt

aksr commented 5 years ago

reffort, n-t-roff: What do you think about this addition?

Alhadis commented 5 years ago

Theoretically, this should already be possible using .sy and diversions (assuming troff is being run with the "unsafe" switch set).

mircea3 commented 5 years ago

Just a few up-to-date comments regarding real-world use of this feature:

1) .sy launches a new subprocess every time. This is expensive if it needs to be called thousands of times for a large document. Also, all state between such calls would be lost. (Please correct me if I'm wrong.)

2) In practice, it turned out to be much more useful for \K to be interpolated right-away, even in copy mode (similar to \n, \*, etc.), and if not desired, to be escaped (\\K). For this reason, there is a tiny change with respect to the initial patch. Otherwise it's the same.

3) It was a bit tricky, at first, to deal with all the escape sequences, but as time passed, acceptable solutions were found.

(Also, the external python script has been improved in the meantime.)

Alhadis commented 5 years ago

Fair enough (I find myself missing Perl's regular expressions often enough that I'd find this useful too).

Wouldn't .execute and \E have been more appropriate names for this feature, though?

mircea3 commented 5 years ago

Yes, those names sound good to me. It's nice that they are matched together. Would be nice for .execute to somehow convey that the subprocess keeps running, but I can't think of a better name right now.

Alhadis commented 5 years ago

Would be nice for .execute to somehow convey that the subprocess keeps running

"Scripting" doesn't exactly convey that either...

mircea3 commented 5 years ago

.spawn or .subprocess might describe it well.

However (just checked), it seems \E is being used for "current escape character". It's not too easy to find an unused letter for the escape sequence.

How about \> to represent a typical interactive prompt? For example:

\>'3+5' would evaluate to the string "8"

Alhadis commented 5 years ago

.spawn works. 👍

it seems \E is being used for "current escape character".

No, that would be \e. 😉 (case-sensitive)

Alhadis commented 5 years ago

Also, how does one terminate the spawned process if they wish to start another? By running .spawn without arguments? That'd be consistent with the usual Troff convention of restoring state by reinvoking the same macro with no parameters; e.g, .in, .cc.

mircea3 commented 5 years ago

It seems both versions of \e are used (?): \e Printable version of the current escape character \E Escape character, not interpreted in copy mode

Yes, terminating by invoking with no arguments sounds good.

Alhadis commented 5 years ago

I'm guessing one must be an extension then. I was reading off CSTR54. 🤔 Yeah, just stick with \> then. 😂

I recommend calling it .spawn, it's much more descriptive and less likely to encounter conflict in-the-wild. 👍

mircea3 commented 5 years ago

Following the discussion above, attached is a new patch with the following differences:

An updated, simplified example, using the same python script as before:

Spawn python: .spawn python trsrv.py

Store a number and a string: \>"x=5; s='foo'"

Alternatively, you can write it on a separate line; a leading dot is used to prevent an extra blank line: .\>"x=5; s='foo'"

Evaluate an expression: \>"x+10" \>"s+'bar'"

Results in: 15 foobar

Terminate subprocess (and then you're free to start another): .spawn

Although working, please note I don't fully understand all of the code; certain parts are copied from similar examples. Please feel free to modify and fix anything which looks wrong, or is just wrong style. Thanks!

htroff_add_spawn.patch.txt

reffort commented 5 years ago

@aksr, I've been sitting this out because I'm unclear as to the actual problem or benefit, why the existing macro structure won't do the job, and why there is a requirement to use an inline escape instead of an ordinary macro call.

Sent with ProtonMail Secure Email.

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Sunday, February 3, 2019 5:26 AM, aksr notifications@github.com wrote:

reffort, n-t-roff: What do you think about this addition?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

mircea3 commented 5 years ago

Sorry, I should have linked to the original mailing list thread which started all this: http://lists.gnu.org/archive/html/groff/2019-01/msg00000.html

A pure-troff solution was posted by Tadziu, with known limitations. Unfortunately, I did hit those limitations...

reffort commented 5 years ago

Unfortunately, those messages don't help any. The initial problem was solved, then it changed into a second problem that is not described at all.

The inference is that the limitation, whichever one it is, can be found in the second topic (the one about reformatting a date string), but that discussion looks like a joke thread to me. That problem can be solved in two lines, so I think that most of those messages are probably a contest to see who can derive the most patently ridiculous solution (I hope that's the case, anyway). It would not be surprising to find a few "limitations" there.

At any rate, I have no opinion because I don't have enough information to form one.

mircea3 commented 5 years ago

I will try to clarify by summarizing the thread. The specific problem is: How to call a macro from a string, to do more complex expressions.

And the suggested solution was this:

.de m
\c
.ie \\nx%2 .nr x \\nx*3+1
.el        .nr x \\nx/2
..
.ds s \\*m\\nx    \" modify x by complex expression, then use it
.
.nr x 27
\*s \*s \*s \*s

Result: 82 41 124 62

Putting a \c in there works in a simple example, but breaks other existing code. Quoting the author of the solution:

"Be aware that this is only a trick. It works in the running text, but can go wrong unexpectedly in other situations (such as in titles)."

The solution I ended up using is not addressing specifically this problem. It gave good results on this problem and also simplified a few other practical tasks. It was shared only with the intent that others might find it useful, now or maybe in the future.

I initially came to the mailing list looking (and hoping) for a pure-troff solution, for example, something similar to an output-line trap (\P), but which executes the macro immediately. Such a feature, if implemented, would indeed solve this and similar problems. There may be other solutions as well...

(Please let me know if I'm missing something obvious.)