svent / sift

A fast and powerful alternative to grep
https://sift-tool.org
GNU General Public License v3.0
1.6k stars 108 forks source link

Issue with named capture groups? Is it the syntax, or I'm missing something? #54

Closed jjarava closed 8 years ago

jjarava commented 8 years ago

Hi!

I'm trying to get sift to parse quite a "rich" log file

This "incantation" works well:

sift "]SDPTRU02 \[[0-9].*] :\[" *log | sift -e '^.(.*?)]. WARN](.*?) \[([0-9].*)].*"recommendation":{(.*?)},"customer_session_id":"(.*?)"' -e '^.(.*?)]. WARN](.*?) \[([0-9].*)].*("status":"error","message":".*")' --replace '$1|$2|$3|$5|$4' 

And produces the output I'm expecting (yes, in the second sift, the second -e has only 4 capture groups, but that's the way the logs are.

Now, this is quite in the limit (or beyond?) readability, and every time the log format is tweaked (and it is), it's a pain to change.

So following the reference in http://www.regular-expressions.info/named.html, I've arrived to the following:

sift "]SDPTRU02 \[[0-9].*] :\[" *log | sift -e '^.(^P<ts>.*?)]. WARN](^P<op>.*?) \[(^P<time>[0-9].*)].*"recommendation":{(^P<result>.*?)},"customer_session_id":"(^P<csid>.*?)"' -e '^.(^P<ts>.*?)]. WARN](^P<op>.*?) \[(^P<time>[0-9].*)].*(^P<result>"status":"error","message":".*"?)' --replace '${ts}|${op}|${time}|${csid}|${result}' 

Which doesn't match anything. I've tried using $name and ${name} syntax for the --replace section, as the docs in https://sift-tool.org/docs actually are a bit confusing... The docs say:

Use ${1}, ${2}, $name, ... for captured submatches

but in fact the examples use the --replace '$1' syntax

Anyhow I'm stuck - I don't know if what I'm trying to do is not possible (I doubt it), or where I'm making the mistake...

Thanks!

svent commented 8 years ago

Hi,

thanks for sharing this complex example. ${1} and $1 are both correct, the former is just more specific and prevents problem e.g. when the variable is directly followed by a number (would $11 be submatch 1 followed by '1' or submatch 11?).

Using named submatches should work: please try to use (?P<name>pattern) within the regular expressions. I will update the docs to make that more clear. The --replace parameter looks fine.

I hope this solves the problem, if it does not it would be great if you could share one (anonymized) sample log line for testing.