kassner / log-parser

PHP Web Server Log Parser Library
Apache License 2.0
334 stars 64 forks source link

possibility to skip records #31

Closed ToeiRei closed 7 years ago

ToeiRei commented 7 years ago

I'm working on some log analyzing and came across some uncool stuff on my live server which reads as follows:

111.222.333.444 - - [25/Jun/2017:11:51:35 +0200] "\x16\x03\x01" 400 0 "-" "-"

Is there a possibility to skip those records and keep on parsing the rest?

kassner commented 7 years ago

@ToeiRei

The parse function will throw one exception if the line you're trying to parse is out of format, so if that happens, just wrap the function inside a try/catch on your loop, like:

$lines = file('/var/log/apache2/access.log', FILE_IGNORE_NEW_LINES | FILE_SKIP_EMPTY_LINES);
foreach ($lines as $line) {
    try {
        $entry = $parser->parse($line);
    } catch (\Kassner\LogParser\FormatException $e) {
        // the line wasn't parsed, do something
    }
}

But, if the parse isn't throwing an exception, the best you can do is ignore the line based on a few conditions (the conditions are up to you). If your visitor is using a nearly recent browser (i.e.: last 10 years), it's very rare that those requests will end up with HTTP status 400, only if your application on the server is returning those status on purpose based on some conditions. In that case I suggest you to compare status and request with the ones you informed, and skip those lines. It looks like this \x16\x03\x01 is a quite common bot problem, so I suppose you can safely ignore them if you want.

ToeiRei commented 7 years ago

sounds like a plan. I wasn't aware of try/catch in PHP.