radanalyticsio / silex

something to help you spark
Apache License 2.0
65 stars 13 forks source link

Add a basic tokenizer for log messages #39

Closed willb closed 8 years ago

willb commented 8 years ago

@erikerlandson if you can give this a quick look I'll merge after your lgtm (thanks!)

erikerlandson commented 8 years ago

My comments below notwithstanding, this LGTM

A few comments:

erikerlandson commented 8 years ago

Oh, one more: might be good to add unit test that exercises the various filters all at the same time. e.g. tokens that include both punctuation that should be stripped and that should be kept. Technically, should probably also unit test post and pred

erikerlandson commented 8 years ago

LGTM!

willb commented 8 years ago

@erikerlandson Thanks!