squeeks / glossy

syslog parser and producer
https://npmjs.org/package/glossy
MIT License
96 stars 29 forks source link

Syslog parsing issue since the 1st May #7

Closed damomurf closed 11 years ago

damomurf commented 12 years ago

I've been using glossy reliably for some time now, but since midnight on the 1st May (local time), the library appears to be mis-parsing syslog packets. The sources are both OSX Lion, and Ubuntu 11.10.

You can see that the time and host appear to be borked, and this is happening consistently on all logs received.

An example parsed object dumped via console.log:

{ originalMessage: '<5>May  1 20:54:50 imac [0x0-0x63063].com.google.Chrome[676]: [676:-1203687424:8980765111361:ERROR:download_updates_command.cc(107)] PostClientToServerMessage() failed during GetUpdates\n',
  type: 'RFC3164',
  prival: 5,
  facilityID: 0,
  severityID: 5,
  facility: 'kern',
  severity: 'notice',
  time: undefined,
  host: '20:54:50',
  message: 'imac [0x0-0x63063].com.google.Chrome[676]: [676:-1203687424:8980765111361:ERROR:download_updates_command.cc(107)] PostClientToServerMessage() failed during GetUpdates\n' }

I don't believe anything else has changed in terms of dependencies, etc. to cause this. The code I'm using that exhibits the behaviour is below. This is with glossy 0.1.2 and node v0.6.15.

var syslogParser = require('glossy').Parse; // or wherever your glossy libs are

var dgram  = require("dgram");
var server = dgram.createSocket("udp4");

server.on("message", function(rawMessage) {
    syslogParser.parse(rawMessage.toString('utf8', 0), function(parsedMessage){
                console.log(parsedMessage);
        });
});

server.on("listening", function() {
    var address = server.address();
    console.log("Server now listening at " +
        address.address + ":" + address.port);
});

server.bind(1514); 
squeeks commented 12 years ago

The cause: RFC 3164 specifies that the date be formatted "Mmm dd hh:mm:ss", with single space division being visible between month, date and time but it is not explicitly mentioned. Since there's two spaces between "May" and the date, it's breaking at about here for you, where we assume there isn't kind of thing going on.

The (possible) solution: A while back I removed a lot of code that split the message up on /\s+/, as it's painfully slow. I may have to change my decision on that, or I may have to think of a smarter way of doing this. I've got a little bit of time on my hands, hold tight. Suggestions welcome.

OpenAnswers commented 12 years ago

how about removing any empty spaces from the rawMessage, like so:

var segments = rawMessage.split(' ').filter( function( elem, index, ar )
{
  return elem != "";
});

UPDATE: on second thoughts maybe we'd only want to remove the empty space segments up until we reach the actual message.

damomurf commented 12 years ago

In my case, I could certainly split on just the first instance, so a slight modification to your code:

var segments = rawMessage.split(' ',1).filter( function( elem, index, ar )
{
  return elem != "";
});

Thanks for the workaround!

UPDATE:

Ah, no - didn't understand the use of split(' ',1) - so that won't work.

stancarney commented 11 years ago

Appears to still be an issue.

        if(segments[1] == '') segments.splice(1,1);
        var timeStamp         = segments.splice(0,3).join(' ').replace(/^(<\d+>)/,'');
        parsedMessage.time    = parseTimeStamp(timeStamp);
        parsedMessage.host    = segments.shift();
        parsedMessage.message = segments.join(' ');

Remove the bad array element when present for RFC3164 messages?

squeeks commented 11 years ago

@stancarney as small as it is, would it be possible for you to please make a pull request out of that? Credit should be where it's due.

Apologies for ignoring this, I am finding it difficult to find time to look after my code as well as I should these days.

stancarney commented 11 years ago

No problem. I just sent the pull request.

squeeks commented 11 years ago

Thank you, Stan.

This is now fixed and available in npm as 0.1.3.