olsak / OpTeX

OpTeX - LuaTeX format with extended Plain TeX macros
http://petr.olsak.net/optex/
35 stars 14 forks source link

A bug in `\replstring` #173

Closed Udi-Fogiel closed 4 months ago

Udi-Fogiel commented 4 months ago

When we try to do \replstring\foo{<find>}{<replace>}, and \foo has a sequence of the form <find>{<braced content>}<find>, then \replstring strips the braces off. For example

\def\foo{ab{c}b}
\replstring\foo{b}{d}
\show\foo
\bye

> \foo=macro:
->adcd.

It can also happens in other situation, as long as the braced content is passed into a macro as the only argument it will happen. Another example is

\def\foo{{a}b}
\replstring\foo{b}{d}
\show\foo
\bye

> \foo=macro:
->ad.

It is a problem in situation where the braces are important, for example in OpTeX trick 38 the equations must be inside groups.

If we will modify slightly the example in the description of the trick, it will break. For example if we would like to add a space after a bullet

\hbox{%
\circletext {1.7cm} {212}  {{$\bullet$} UNIVERSITAS  CAROLINA PRAGENSIS {$\bullet$} }
                           {\spaceskip=.5em \kpcirc TA{-.1}\kpcirc NA{-.05}}
\circletext {-1.7cm} {237} {Facultas MFF}
                           {\kpcirc Fa{-.1}}
}

or add another bullet in the middle

\hbox{%
\circletext {1.7cm} {212}  {{$\bullet$} UNIVERSITAS {$\bullet$} CAROLINA PRAGENSIS {$\bullet$}}
                           {\spaceskip=.5em \kpcirc TA{-.1}\kpcirc NA{-.05}}
\circletext {-1.7cm} {237} {Facultas MFF}
                           {\kpcirc Fa{-.1}}
}

I suggest a new approach to implement replstring:

\directlua{%
local scan_toks = token.scan_toks
local put_next = token.put_next
local create = token.create
local lbrace = create(string.byte('{'))
local rbrace = create(string.byte('}'))
define_lua_command("replstring", function()
    local macro_name = create(token.scan_csname())
    put_next(create('_expandafter'),
    lbrace,macro_name,rbrace)
    local macro_body = scan_toks()
    local nested = token.scan_keyword('nested')
    local find = scan_toks()
    local replace = scan_toks()
    local find_length = \csstring\#find
    local replace_length = \csstring\#replace
    local range = \csstring\#macro_body - find_length + 1
    local i = 1
    local nested_level = 0
    while i <= range do
        if not nested then 
            if macro_body[i].tok == lbrace.tok then 
            nested_level = nested_level + 1
            elseif macro_body[i].tok == rbrace.tok then
            nested_level = nested_level - 1
            end
        end
        if nested_level > 0 then i = i+1 else
            for j = 0, find_length - 1 do
                if not nested then
                    if macro_body[i + j].tok == lbrace.tok then
                        nested_level = nested_level + 1
                        i = i+j+1
                      break
                    end 
                end
                if (macro_body[i + j].tok \csstring\~= find[j+1].tok) then
                i = i+j+1
              break
                else
                    if j == find_length - 1 then
                        for t = 0, find_length - 1 do
                            table.remove(macro_body,i)
                        end
                        for t = 0, replace_length - 1 do
                            table.insert(macro_body,i+t,replace[t+1])
                        end
                            range = \csstring\#macro_body - find_length + 1
                            i = i+replace_length
                      break
                    end 
                end
            end
        end
    end
    put_next(rbrace)
    put_next(macro_body)
    put_next(create('_immediateassignment'),
    create('_def'),macro_name,lbrace)
end)
}

this implementation is expandable, and add a keyword called nested, which if present will allow for substitution of tokens nested in groups.

A couple of examples

\def\foo{aaa{aaa}a}
\replstring\foo nested{aaa}{c}
\show\foo

> \foo=macro:
->c{c}a.

\def\foo{aaa{aaa}a}
\edef\bar{\replstring\foo{aaa}{c}}
\show\foo

> \foo=macro:
->c{aaa}a.

Another feature of the nested version, is that it allows the <find> argument to contain braces, which with the current implementation it is not possible

\def\foo{aaa{aaa}a}
\replstring\foo nested{{aaa}}{c}
\show\foo

> \foo=macro:
->aaaca.
Udi-Fogiel commented 4 months ago

Note that currently it takes two expansions for \_replstring to do his job, but by putting the last lines (the redefintion of the macro) in tex.runtoks it will take only one expansion.

Udi-Fogiel commented 4 months ago

If we will keep \_replstring unexpandable, it will be possible to use \global or other prefixes (including \immediateassignment so it will be possible for a user to make it expandable).

Another approach is to make it expandable and allow for global definition with a keyword, or \globaldefs.

olsak commented 4 months ago

The \replstring macro is heavily used by hi-syntax macros. It was optimised for speed for this purpose. Can we compare the speed of your Lua implementation with original \replstring? The reason is: the codes which should be hi-syntaxed can be huge, over many pages. The speed of \replstring is then quite noticeable.

I can think of a slightly different solution. To create a macro \xreplstring (eXpandable or eXtended replstring) which implements your idea and is present as auto-loaded OpTeX trick (with you as the author). I can mention the limits of classical \replstring in the OpTeX doc (including the bracket loss problem) and point to the \xreplstring OpTeX tricks here.

Udi-Fogiel commented 4 months ago

The \replstring macro is heavily used by hi-syntax macros. It was optimised for speed for this purpose. Can we compare the speed of your Lua implementation with original \replstring? The reason is: the codes which should be hi-syntaxed can be huge, over many pages. The speed of \replstring is then quite noticeable.

That is a good point, which I did not check until now. There is a significant overhead with the lua implementation when I build OpTeX with make all from scratch (with no build folder). I get an avrage of about 95 seconds with the lua implementation, and 52.5 seconds with the current one (the documentation does also change between the tests, but I don't think this is the reason for such overhead).

Out of curiosity, how many times do you estimate hi-syntax calls \replstring in OpTeX's doc?

I can think of a slightly different solution. To create a macro \xreplstring (eXpandable or eXtended replstring) which implements your idea and is present as auto-loaded OpTeX trick (with you as the author). I can mention the limits of classical \replstring in the OpTeX doc (including the bracket loss problem) and point to the \xreplstring OpTeX tricks here.

Given the results mentioned above I have to agree. Here is a slight modification of the code above:

\directlua{%
local scan_toks = token.scan_toks
local put_next = token.put_next
local create = token.create
local lbrace = create(string.byte('{'))
local rbrace = create(string.byte('}'))
define_lua_command('xreplstring', function()
    local macro_name = create(token.scan_csname())
    put_next(create'_expandafter',
    lbrace,macro_name,rbrace)
    local macro_body = scan_toks(false,false)
    local nested = token.scan_keyword('nested')
    local find = scan_toks(false,false)
    local replace = scan_toks(false,false)
    local find_length = \csstring\#find
    local replace_length = \csstring\#replace
    local range = \csstring\#macro_body - find_length + 1
    local i = 1
    local nested_level = 0
    while i <= range do
        if not nested then 
            if macro_body[i].tok == lbrace.tok then 
            nested_level = nested_level + 1
            elseif macro_body[i].tok == rbrace.tok then
            nested_level = nested_level - 1
            end
        end
        if nested_level > 0 then i = i+1 else
            for j = 0, find_length - 1 do
                if not nested then
                    if macro_body[i + j].tok == lbrace.tok then
                        nested_level = nested_level + 1
                        i = i+j+1
                      break
                    end 
                end
                if (macro_body[i + j].tok \csstring\~= find[j+1].tok) then
                i = i+j+1
              break
                else
                    if j == find_length - 1 then
                        for t = 0, find_length - 1 do
                            table.remove(macro_body,i)
                        end
                        for t = 0, replace_length - 1 do
                            table.insert(macro_body,i+t,replace[t+1])
                        end
                            range = \csstring\#macro_body - find_length + 1
                            i = i+replace_length
                      break
                    end 
                end
            end
        end
    end
    put_next(rbrace)
    put_next(macro_body)
    put_next(create'_def',macro_name,lbrace)
    if macro_name.protected then 
        put_next(create'_protected') 
    end
end)
}
\protected\def\foo{a}
\show\foo
\xreplstring\foo{a}{b}
\show\foo
\edef\bar{\immediateassignment\xreplstring\foo{b}{a}}
\show\bar
\show\foo
{\xreplstring\foo{a}{b}}
\show\foo
{\global\xreplstring\foo{a}{b}}
\show\foo

\bye

This version preserves \protected macros as such, it is not expandable but it works with prefixes (if it was expandable, \global wouldn't work, and one can make it expandable using \immediateassignment). I guess it is better this way, if someone really needs this to take one expansion instead of two, he can add tex.runtoks.

Udi-Fogiel commented 4 months ago

Do you want me to prepare a new PR?

olsak commented 4 months ago

First, I'll prepare the basis of the OpTeX tricks and doc correction and I'll push it to the master branch (maybe today). Then, you can add more comments to the OpTeX trick and prepare it as a PR.