salspaugh / splparser

Simple parser for Splunk Processing Language (SPL) written in Python.
Other
35 stars 14 forks source link

Dedup command #38

Closed richzeng closed 11 years ago

richzeng commented 11 years ago

Added tests for dedup for approval. Still working on parser

Link: http://docs.splunk.com/Documentation/Splunk/5.0.2/SearchReference/Dedup

salspaugh commented 11 years ago

Tests look good. Let me check the query corpus to see if there are any tricky ones to add.

salspaugh commented 11 years ago

You also need to be able to handle fields separated by commas, e.g.,

dedup HostName, signature, action sortby HostName

dedup operator,user_effected,dest_host

Some others you might add:

dedup order_id keepempty=T

dedup DMZ:192.168.11.44/2607

Annoyingly, there are a number of dedup query instances which use eval (for example, dedup BookName ValDate eval GTS_EODPass=if(Lag=0,1,0)), but since that doesn't appear in the documentation, and would really change your rules (since you'd have to change the lexer), we'll just not worry about that for now.

richzeng commented 11 years ago

All the tests pass now. @salspaugh can you double check for me?

salspaugh commented 11 years ago

Why did you use evalregexes instead of searchregexes? Because of the minus and stuff? Because it would be nice if you could identify types like IP addresses and stuff, because I think types could make a potentially useful feature when we're clustering and classifying over queries, so the more commands which include do good type recognition, the better.