zendesk / ultragrep

the grep that greps the hardest.
Apache License 2.0
29 stars 4 forks source link

simplify file selection #8

Closed grosser closed 11 years ago

grosser commented 11 years ago

Not sure why this was so complicated but still works fine after throwing all that stuff out :)

Only "feature" missing is the host grouping, but does not look like we need that oO

@osheroff @vanchi-zendesk

osheroff commented 11 years ago

not quite. You've missed a subtle point of ultragrep; it tries very very hard to print out matching log lines in strict chronological order. Since it's blasting through a ton of log files at (presumably) varying speeds, this means that it can't actually print anything until each log has grepped up to the same timestamp.

The host-grouping logic is there so we can blast through each day's worth of logs roughly simultaneously without buffering a ton of results.

To be precise; with this patch we would be doing:

ug_guts app1/2012-03-04 ug_guts app2/2012-03-04 ug_guts app1/2012-03-05 ug_guts app2/2012-03-05

in parallel, but any matching items from the 03-05 processes would simply sit in a buffer, waiting until the 03-04 files had finished. (a sidebar is that it might not work at all for weird reasons).

also speaks to the need for a decent integration test.

grosser commented 11 years ago

Good catch, I brought back the grouping + documented/tested it, looks good now ?

On Sun, Jun 2, 2013 at 10:39 PM, osheroff notifications@github.com wrote:

not quite. You've missed a subtle point of ultragrep; it tries very very hard to print out matching log lines in strict chronological order. Since it's blasting through a ton of log files at (presumably) varying speeds, this means that it can't actually print anything until each log has grepped up to the same timestamp.

The host-grouping logic is there so we can blast through each day's worth of logs roughly simultaneously without buffering a ton of results.

To be precise; with this patch we would be doing:

ug_guts app1/2012-03-04 ug_guts app2/2012-03-04 ug_guts app1/2012-03-05 ug_guts app2/2012-03-05

in parallel, but any matching items from the 03-05 processes would simply sit in a buffer, waiting until the 03-04 files had finished. (a sidebar is that it might not work at all for weird reasons).

also speaks to the need for a decent integration test.

— Reply to this email directly or view it on GitHubhttps://github.com/zendesk/ultragrep/pull/8#issuecomment-18822754 .

osheroff commented 11 years ago

nope. tested it on logs1.pod2, it doesn't do the right thing. it's searching all of app8's logs together and then moving on to app2's, etc.

grosser commented 11 years ago

Yep, fixed it to group by date instead :)

On Mon, Jun 3, 2013 at 10:23 AM, osheroff notifications@github.com wrote:

nope. tested it on logs1.pod2, it doesn't do the right thing. it's searching all of app8's logs together and then moving on to app2's, etc.

— Reply to this email directly or view it on GitHubhttps://github.com/zendesk/ultragrep/pull/8#issuecomment-18856660 .