zentures / sequence

(Unmaintained) High performance sequential log analyzer and parser
http://sequencer.io
519 stars 72 forks source link

Unknown token encountered #6

Closed wkrause13 closed 9 years ago

wkrause13 commented 9 years ago

I have a number of log sources that have characters that force the analyzer to stop before completion with an "unknown token encountered" error. Is there a way to run the analyzer in a "best effort" mode so that if the analyzer encounters a line with characters it is unable to tokenize it skips that line?

If not, is there documentation on how one might deal with this type of error? I have no problem pre-processing the logs, I just am unable to tell from the source code alone which characters result in the TokenUnknown case.

zhenjl commented 9 years ago

Is it possible to share a small subset of the logs so I can take a look?

wkrause13 commented 9 years ago

Here are three log lines which cause the sequencer to return the "unknown token encountered" message. The failure on the last line is understandable, given that it's a log record of a full sql statment which doens't really follow a "proper" log structure like this project expects to recieve. It would be nice if the sequencer could recover from garbage log lines, but it is certainly something I can work around with some pre-processing.

The first two lines though I was hoping to understand a little better as to why they cause the sequencer to stop.

2015-01-21 21:41:27 4515 [Note] - '::' resolves to '::'; 2015-01-21 21:41:27 4515 [Note] Server socket created on IP: '::'. 2015-01-22 21:12:14 4515 [Warning] Unsafe statement written to the binary log using statement format since BINLOG_FORMAT = STATEMENT. INSERT... ON DUPLICATE KEY UPDATE on a table with more than one UNIQUE KEY is unsafe Statement: INSERT INTO surf_swell_filter_predicates ( surf_id, swell_id, element_id, predicate_type, action, has_extra_data,device_id,plugin_id,object_id,indicator_id,object_type_id,indicator_type_id,device_class_id,device_group_id,object_group_id ) VALUES ( 13,49,206,"flowInterface","=","1",NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL,NULL ) ON DUPLICATE KEY UPDATE predicate_type = VALUES(predicate_type), action = VALUES(action), has_extra_data = VALUES(has_extra_data), device_id = VALUES(device_id), plugin_id = VALUES(plugin_id), object_id = VALUES(object_id), indicator_id = VALUES(indicator_id), object_type_id = VALUES(object_type_id), indicator_type_id = VALUES(indicator_type_id), device_class_id = VALUES(device_class_id), device_group_id = VALUES(device_group_id), object_group_id = VALUES(object_group_id)

zhenjl commented 9 years ago

Thanks @wkrause13 . You hit a bug where I was not recognizing "::" as a valid IPv6 address correctly (actually I think I was lazy and completely missed it.)

I just checked in a fix and the first two logs seem to be tokenizing correctly now. Can you let me know if it works for you?

thx

$ go run ./sequence.go scan -m "2015-01-21 21:41:27 4515 [Note] - '::' resolves to '::';"
#   0: { Field="%funknown%", Type="%time%", Value="2015-01-21 21:41:27", K=false, V=false }
#   1: { Field="%funknown%", Type="%integer%", Value="4515", K=false, V=false }
#   2: { Field="%funknown%", Type="%literal%", Value="[", K=false, V=false }
#   3: { Field="%funknown%", Type="%literal%", Value="Note", K=false, V=false }
#   4: { Field="%funknown%", Type="%literal%", Value="]", K=false, V=false }
#   5: { Field="%funknown%", Type="%literal%", Value="-", K=false, V=false }
#   6: { Field="%funknown%", Type="%literal%", Value="'", K=false, V=false }
#   7: { Field="%funknown%", Type="%ipv6%", Value="::", K=false, V=false }
#   8: { Field="%funknown%", Type="%literal%", Value="'", K=false, V=false }
#   9: { Field="%funknown%", Type="%literal%", Value="resolves", K=false, V=false }
#  10: { Field="%funknown%", Type="%literal%", Value="to", K=false, V=false }
#  11: { Field="%funknown%", Type="%literal%", Value="'", K=false, V=false }
#  12: { Field="%funknown%", Type="%ipv6%", Value="::", K=false, V=false }
#  13: { Field="%funknown%", Type="%literal%", Value="'", K=false, V=false }
#  14: { Field="%funknown%", Type="%literal%", Value=";", K=false, V=false }
$ go run ./sequence.go scan -m "2015-01-21 21:41:27 4515 [Note] Server socket created on IP: ':
:'."
#   0: { Field="%funknown%", Type="%time%", Value="2015-01-21 21:41:27", K=false, V=false }
#   1: { Field="%funknown%", Type="%integer%", Value="4515", K=false, V=false }
#   2: { Field="%funknown%", Type="%literal%", Value="[", K=false, V=false }
#   3: { Field="%funknown%", Type="%literal%", Value="Note", K=false, V=false }
#   4: { Field="%funknown%", Type="%literal%", Value="]", K=false, V=false }
#   5: { Field="%funknown%", Type="%literal%", Value="Server", K=false, V=false }
#   6: { Field="%funknown%", Type="%literal%", Value="socket", K=false, V=false }
#   7: { Field="%funknown%", Type="%literal%", Value="created", K=false, V=false }
#   8: { Field="%funknown%", Type="%literal%", Value="on", K=false, V=false }
#   9: { Field="%funknown%", Type="%literal%", Value="IP", K=false, V=false }
#  10: { Field="%funknown%", Type="%literal%", Value=":", K=false, V=false }
#  11: { Field="%funknown%", Type="%literal%", Value="'", K=false, V=false }
#  12: { Field="%funknown%", Type="%ipv6%", Value="::", K=false, V=false }
#  13: { Field="%funknown%", Type="%literal%", Value="'", K=false, V=false }
#  14: { Field="%funknown%", Type="%literal%", Value=".", K=false, V=false }
zhenjl commented 9 years ago

@wkrause13 assume things are working so closing this...let me know if you have any questions..thx