Open gromgit opened 2 years ago
Hi @gromgit --
TL;DR assign the mapexcept
bit to a temp variable and emit that:
$ mlr --csv --from f.dat put 'temp = mapexcept($*, "a"); emit > "/tmp/data-".$a, temp'
a,date,open,high,low,close,volume,wap,bid,ask,status,currency,market
1,2022-05-12,54.200,54.500,53.400,53.550,3404922,-1.000,53.550,53.600,,HKD,XHKG
2,2022-05-12,75.900,76.100,75.600,75.750,2063001,-1.000,75.750,75.800,,HKD,XHKG
3,2022-05-12,8.400,8.430,8.340,8.350,14994100,-1.000,8.350,8.360,,HKD,XHKG
4,2022-05-12,21.500,21.500,20.700,20.700,1614083,-1.000,20.650,20.700,,HKD,XHKG
######################################################## /tmp/data-
1=a,2=date,3=open,4=high,5=low,6=close,7=volume,8=wap,9=bid,10=ask,11=status,12=currency,13=market
1=1,2=2022-05-12,3=54.200,4=54.500,5=53.400,6=53.550,7=3404922,8=-1.000,9=53.550,10=53.600,12=HKD,13=XHKG
1=2,2=2022-05-12,3=75.900,4=76.100,5=75.600,6=75.750,7=2063001,8=-1.000,9=75.750,10=75.800,12=HKD,13=XHKG
1=3,2=2022-05-12,3=8.400,4=8.430,5=8.340,6=8.350,7=14994100,8=-1.000,9=8.350,10=8.360,12=HKD,13=XHKG
1=4,2=2022-05-12,3=21.500,4=21.500,5=20.700,6=20.700,7=1614083,8=-1.000,9=20.650,10=20.700,12=HKD,13=XHKG
######################################################## /tmp/data-1
date,open,high,low,close,volume,wap,bid,ask,status,currency,market
2022-05-12,54.200,54.500,53.400,53.550,3404922,-1.000,53.550,53.600,,HKD,XHKG
######################################################## /tmp/data-2
date,open,high,low,close,volume,wap,bid,ask,status,currency,market
2022-05-12,75.900,76.100,75.600,75.750,2063001,-1.000,75.750,75.800,,HKD,XHKG
######################################################## /tmp/data-3
date,open,high,low,close,volume,wap,bid,ask,status,currency,market
2022-05-12,8.400,8.430,8.340,8.350,14994100,-1.000,8.350,8.360,,HKD,XHKG
######################################################## /tmp/data-4
date,open,high,low,close,volume,wap,bid,ask,status,currency,market
2022-05-12,21.500,21.500,20.700,20.700,1614083,-1.000,20.650,20.700,,HKD,XHKG
Longer reason is here:
https://miller.readthedocs.io/en/latest/reference-dsl-output-statements/#emit1-and-emitemitpemitf
This is ultimately because when I was first creating Miller -- & emit
was there from the start, before local variables, or for-loops, or any of these relatively more powerful syntaxes -- I packed a lot of syntax (too much) into emit
. And I did it as a keyword, not as a function.
What most parsers (Miller's included) do is have a "lookahead of one symbol" -- LR1
being the jargon. So after the emit
statement, the 'what comes next' and the 'one ahead of that' need to be unambiguous.
Since emit is keyword, with no parentheses, and I added the ability to emit multiple oosvars, and possibly indexed, etc., there are too many possibilities for the parser to handle with regard to parentheses, commas, etc.
In Miller 5 there were "LR1 reduce-reduce conflicts" and I understood less then, and I somehow got the Lemon parser to handle them by doing a rule like "accept conflicts by using first-found rule" which was a huge hack.
In Miller 6, being a little less clueless about parsing, I allowed no shift-reduce or reduce-reduce conflicts in the grammar.
See also https://miller.readthedocs.io/en/latest/new-in-miller-6/#emit-statements
The result is what we have at
https://miller.readthedocs.io/en/latest/reference-dsl-output-statements/#emit1-and-emitemitpemitf
-- namely that you can use emit1
to put the grammatical complexity in the emittable & the keys, or, use a temp variable (a syntactically simpler emittable) with emit
.
Beyond this temp-variable workaround, question is what to do now to get syntactical support to make all the richness of emittables, keys, and redirector all in one expression, in a way that's LR1-parseable.
Given the fact that the emit
syntax was, in hindsight, very poorly thought out, really the best I can do moving forward is make (yikes!) yet another pair of emit variants --emitv2
and emitpv2
-- which would have the syntactic structure that emit
and emitp
should have (in hindsight) had all along. Namely:
emit([@var1, @var2], ["key1", "key2"]);
emit([@var1, @var2], ["key1", "key2"]) > "/tmp/data-".$a;
@gromgit also I'll update the docs to use the temp-var workaround
Thanks much for the explanation and workaround, @johnkerl!
I was trying out one of the
emit
example commands from the documentation, and ran into an unexpected parse failure:As far as I know, the DSL expression is syntactically correct, so I'm assuming it's a parser bug rather than a documentation issue. I also get a parse error if I substitute
mapselect
in the above expression.