kassner / log-parser

PHP Web Server Log Parser Library
Apache License 2.0
334 stars 64 forks source link

garbage columns #40

Closed catbadger closed 5 years ago

catbadger commented 5 years ago

A log file contains some garbage columns that i don't need to work with. How do I skip the bad columns?

catbadger commented 5 years ago

so to leave more data, the logs i want to parse look like this, and I'm having a heck of a time figuring out how to do the format string...

Mar  8 19:36:37 ip-172-31-16-77 haproxy[2168]: 666.666.666.666:666 [08/Mar/2019:19:36:37.629] http-in~ wordpress/webv2 205/0/1/1/207 200 6826 - - ---- 8/8/1/1/0 0/0 "GET /wp-includes/js/underscore.min.js?ver=1.8.3 HTTP/1.1"
kassner commented 5 years ago

Hi @catbadger

You can create a custom format like Joshua mentions in https://github.com/kassner/log-parser/issues/39

You'll have to create a few patterns and assign them to a name in order for them to match, and then just not use them later on.

So, if you do something like this:

$parser = new \Kassner\LogParser\LogParser();
$parser->addPattern('%GBG1', '(?P<gb1>[a-zA-Z]+\s+\d+ \d+\:\d+\:\d+)');
$parser->addPattern('%GBG2', '(?P<gb2>[a-zA-Z]+\[\d+\]\:)');
$parser->setFormat('%GBG1 %h %GBG2 %a:%p');

var_dump($parser->parse('Mar  8 19:36:37 ip-172-31-16-77 haproxy[2168]: 1.2.3.4:5678'));

You will get an object like this:

object(stdClass)#3 (5) {
  ["gb1"]=>
  string(15) "Mar  8 19:36:37"
  ["host"]=>
  string(15) "ip-172-31-16-77"
  ["gb2"]=>
  string(14) "haproxy[2168]:"
  ["remoteIp"]=>
  string(7) "1.2.3.4"
  ["port"]=>
  string(4) "5678"
}

Then you can just ignore gb1 and gb2.