Closed stevecheckoway closed 6 years ago
Your analysis is right, but that doesn't necessarily mean that a change is likely.
As you may know, I'm looking after etoolbox
as the original author (Philipp Lehman) 'dropped off the radar' some years ago. There were a few changes that were essential, so someone had to pick up the package. What I'm therefore mindful of is that changes need to be in line with Philipp's original ideas. Here, he chose a particular approach, and I'm wary of change.
At the same time, I'm on the LaTeX3 Project, and we have a 'bigger' set of programming tools (expl3
). I have to be careful that etoolbox
doesn't end up too much as simply 'a second version of the same stuff'. If you look at the (for example) comma list mapping implementation in expl3
is uses exactly the approach you suggest, with a marker, though there are some subtle differences as brace stripping issues are not the same.
I'll need to think about this: the question is not technical but rather one of policy.
Makes sense.
I glanced quickly at etextools and pgffor to see if I could figure out what their loops' behaviors are but both are inscrutable (for entirely different reasons).
I'd stay away from etextools
: it's not maintained and there are some long-standing issues.
That's good to know and also too bad since it seems to add some missing functionality around comma-separated lists. It's probably time to bite the bullet and learn expl3.
@stevecheckoway I'm happy to consider new functionality in etoolbox
if it's within the scope of the package. Ideally, the guy behind etextools
would adjust his code such that I can move 'core' ideas to etoolbox
, but that has not happened to date.
I think on balance I will look at the performance here: I doubt Philipp intended to pick a slow method.
\etb@defparser@arg
has the same quadratic behavior and I think \etb@forlistloop@i
does as well, but I'm running out the door and I merely glanced at it just now.
My implementation idea doesn't work exactly as specified. The problem is that looking for the sentinel causes braces to be stripped. \docsvlist{a,{b,c}}
and \docsvlist{a, {b,c}}
now have different behavior. The first expands to (essentially) \do{a}\do{b}\do{c}
with the new definition and the second to \do{a}\do{b,c}
.
The list macros have other brace issues, namely \docsvlist{{a}b,c}
expands to \do{ab}\do{c}
which isn't ideal. But that behavior doesn't justify the stripping braces around {b,c}
which drastically changes the meaning.
I think I figured out an approach that works. I implemented it for the internal lists first and then adapted it to the comma-separated lists. More effort is involved in the comma-separated list case.
The internal list macros strip one layer of braces already, but I think with some care, the sentinel approach should work here as long as braces are added around the part that's being put back in the input. Something like this.
\long\def\etb@forlistloop#1#2{\etb@forlistloop@i{#2}#1|\etb@lst@sentinel|&}%
\long\def\etb@forlistloop@i#1#2|#3|{%
\ifblank{#2}%
{}%
{#1{#2}}%
\ifx\etb@lst@sentinel#3\@empty
\expandafter\@firstoftwo
\else
\expandafter\@secondoftwo
\fi
{\listbreak}%
{\etb@forlistloop@i{#1}{#3}|}%
}%
In the second to last line, the braces around #3
will be stripped by \etb@forlistloop@i
but they're needed to prevent two layers of braces from being stripped. I.e., given
\def\foo{}
\listadd\foo{{{foo}}}
\listadd\foo{{{bar}}}
\dolistloop\foo
expands to \do{{foo}}\do{{bar}}
Using either \listadd\foo{foo}
or \listadd\foo{{foo}}
instead expands to \do{foo}
. I believe my implementation preserves that behavior.
Fixing the comma-separated list versions take more work. But not a whole lot more. The key idea is to change \etb@lst@docsvlist
and \etb@lst@forcsvlist
such that the first argument contains the leading-space-stripped current item in the list. Here's my code.
\def\if@etb@sentinel#1{%
\expandafter\ifx\expandafter\etb@lst@sentinel\@firstofone#1\@empty
\expandafter\@firstoftwo
\else
\expandafter\@secondoftwo
\fi
}
\def\etb@defparser@do#1#2{%
\begingroup
\edef\@tempa{%
\endgroup
\long\def\noexpand#1####1{%
% Start with an empty (not just blank) first argument.
\expandafter\noexpand
\csname etb@lst@\expandafter\@gobble\string#1\endcsname
{}\space####1\noexpand#2\noexpand\etb@lst@sentinel\noexpand#2&%
}%
\long\csdef{etb@lst@\expandafter\@gobble\string#1}####1####2\noexpand#2{%
% If the argument is empty, there's nothing to do. Otherwise `\do{#1}`
\noexpand\ifstrempty{####1}%
{}%
{\noexpand\do{####1}}%
% If it's blank, it isn't the sentinel so pass an empty first argument to
% this macro. Otherwise, check for the sentinel and break if
% found, otherwise, expand the macro again, but this time strip leading
% spaces from ####2 using the `\@firstofone`.
\noexpand\ifblank{####2}%
{%
\expandafter\noexpand
\csname etb@lst@\expandafter\@gobble\string#1\endcsname{}\space
}%
{%
\noexpand\if@etb@sentinel{####2}%
{\noexpand\listbreak}%
{%
\noexpand\expandafter
\expandafter\noexpand
\csname etb@lst@\expandafter\@gobble\string#1\endcsname
\noexpand\expandafter{\noexpand\@firstofone####2}\space
}%
}%
}%
}%
\@tempa
}
\def\etb@defparser@arg#1#2{%
\begingroup
\edef\@tempa{%
\endgroup
\long\def\noexpand#1####1####2{%
% Start with an empty (not just blank) first argument.
\expandafter\noexpand
\csname etb@lst@\expandafter\@gobble\string#1\endcsname
{}{####1}\space####2\noexpand#2\noexpand\etb@lst@sentinel\noexpand#2&%
}%
\long\csdef{etb@lst@\expandafter\@gobble\string#1}####1####2####3\noexpand#2{%
% If the first argument is empty, there's nothing to do for this item.
% Otherwise `#2{#1}`
\noexpand\ifstrempty{####1}%
{}%
{####2{####1}}%
% If it's blank, it isn't the sentinel so pass an empty first argument to
% this macro. Otherwise, check for the sentinel and break if
% found, otherwise, expand the macro again, but this time strip leading
% spaces from ####3 using `\@firstofone`.
\noexpand\ifblank{####3}%
{%
\expandafter\noexpand
\csname etb@lst@\expandafter\@gobble\string#1\endcsname{}{####2}\space
}%
{%
\noexpand\if@etb@sentinel{####3}%
{\noexpand\listbreak}%
{%
\noexpand\expandafter
\expandafter\noexpand
\csname etb@lst@\expandafter\@gobble\string#1\endcsname
\noexpand\expandafter{\noexpand\@firstofone\space####3}{####2}\space
}%
}%
}%
}%
\@tempa
}
Note that these don't need \etb@listitem
and \etb@listitem@i
any longer because leading spaces (and braces) have already been stripped \@firstofone
The \if@etb@sentinel
can also be used in \etb@forlistloop@i
(that's why the \@empty
is there, otherwise it's not necessary):
\long\def\etb@forlistloop@i#1#2|#3|{%
\ifblank{#2}%
{}%
{#1{#2}}%
\if@etb@sentinel{#3}
{\listbreak}%
{\etb@forlistloop@i{#1}{#3}|}%
}
Regardless of whether you want to use this new code for either the comma-separated lists or the internal list, I think you need to revert https://github.com/josephwright/etoolbox/commit/365a80ab84981130b84fb660cc929d2a5e249c7b because stripping braces is definitely the wrong behavior.
@stevecheckoway Well brace/space behaviour is already not quite what I'd aim for in a new implementation: Try something like {a},b, {c},{d} , {e}
with the older code ...
I'm of course in the position with etoolbox
that I don't have any tests set up: I guess I'll need to set some up here.
Looks like \do{a}\do{b}\do{c}\do{d }\do{e}
(and a quick test confirms). The existing behavior may not be ideal, but we definitely shouldn't change it and https://github.com/josephwright/etoolbox/commit/365a80ab84981130b84fb660cc929d2a5e249c7b changed it.
I'll set up some tests using l3build
in the morning, along with reverting the commit if required and addressing the issue(s).
Tests are a great idea! I recommend a,{b,c}
be one of them.
Running
\def\do#1{(#1)}
\def\expected{(a)(b,c)}
\edef\actual{\docsvlist{a,{b,c}}}
\ifx\expected\actual
\typeout{Match^^J}
\else
\typeout{Expected: \meaning\expected^^JActual: \meaning\actual^^J}
\fi
with the new code prints
Expected: macro:->(a)(b,c)
Actual: macro:->(a)(b)(c)
and prints Match
with the old.
@stevecheckoway Hopefully things are now working for the list parser case. I'll work on the other performance issues, and I'll give it a day or so before a release unless I hear back that it looks good to you.
These all look good to me and pass my minimal test cases. Using \@nil
instead of \space
is clever. Thanks for taking a look at these!
I was looking at the implementation of
\docsvlist
and\forcsvlist
and trying to understand the purpose of the\space
macros (possibly to prevent stripping off too many layers of braces?) and I noticed that the parsers have running time quadratic in the length of the list.The issue is the check if there are any more elements in the list starting on lines 1508 and 1524. The
\ifblank
is going to\detokenize
the rest of the list, and then assuming it isn't blank, it has to put the tokens back.It's slightly difficult to talk about these macros in general, so I'll focus on
\docsvlist
. If I understand this correctly,\docsvlist{a,b,c}
expands to\etb@lst@docsvlist a,b,c,&
(where the aforementioned\space
is what's causing the extra space after\etb@lst@docsvlist
).Rather than appending
,&
,\docsvlist
should be able to append something like,\etb@lst@sentinel,&
. Then,\etb@lst@docsvlist
can be defined likeThis implementation appears to be expandable just like the original, but simpler. The only change is looking for the sentinel rather than checking if the rest of the list is blank.
f.txt is a simple version demonstrating the code and demonstrating the expandable nature of the list.
t.txt computes how long it takes to iterate through a 500-element comma-separated list using both the etoolbox code and mine. It takes the mean of 1000 runs which is probably over kill, but on my (several year old) machine, the mean time for the original code is 2558/2^16 seconds and 70/2^16 seconds for my code.
If you change the number of elements in the list, the quadratic run-time becomes apparent.
I realize that by TeX's very nature, lists are going to take quadratic time for a bunch of operations that should be linear time (or amortized constant time) in a sane programming language. But I don't think iterating over a list is one of them.