Open balki opened 2 weeks ago
There is indeed laziness and some early-out logic when head
is in the verb list -- however there is some batching (default 500 rows at a time) which was necessary for performance in the port from C to Go ....
If we're getting readahead of over 500 records then that's a bug though ...
(In C it was record-at-a-time lazy ... in Go it's 500-records-at-a-time lazy ....)
OTOH this looks odd to me:
❯ {rm /tmp/1; echo index; seq 10} | mlr --c2p head -n 5 then put '$v = system("echo hello; echo err >> /tmp/1")' ; nl /tmp/1
🤔 👀
(In C it was record-at-a-time lazy ... in Go it's 500-records-at-a-time lazy ....)
Thanks for clarifying. Makes sense. I was running below in the logs and found it took a long time (11 seconds) when head
was used after put
but the other way was instant. I think I should just move filter
and head
as early as possible.
❯ mlr --l2p --tz America/Toronto put '$ts = sec2localtime($ts); $cn = system(format("geoiplookup {} | grep Country", $request.remote_ip))' then filter '$status == 200' then flatten t
hen cut -of ts,cn,request.remote_ip,request.uri then head caddy.log | wc -l
11
~/tmp/millerexp took 11s
❯ mlr --l2p --tz America/Toronto filter '$status == 200' then head then put '$ts = sec2localtime($ts); $cn = system(format("geoiplookup {} | grep Country", $request.remote_ip))' then
filter '$status == 200' then flatten then cut -of ts,cn,request.remote_ip,request.uri caddy.log | wc -l
11
it took a long time (11 seconds) when head was used after put but the other way was instant
@balki this needs fixing for sure.
In the below example, only first 5 records are needed. But
system
input
has run for all the records as we can see in the tmp file.When in
head
is moved ahead ofput
, it works fine.It appears that each verb is run on all records before moving to rest. Can miller be made lazy? I understand it will not be possible when stats/grouping is used. But for simple case I thought it wold work lazy.