Closed willb closed 8 years ago
My comments below notwithstanding, this LGTM
A few comments:
tokens
should have some short end-to-end description of what happens. Something like: "Log message is split into tokens separated by whitespace, anything that isn't alpha-num plus _ plus - is stripped, then function post
is applied to each token, then any token not containing at least one letter is filtered out, then pred
is applied as a final filter" Oh, one more: might be good to add unit test that exercises the various filters all at the same time. e.g. tokens that include both punctuation that should be stripped and that should be kept.
Technically, should probably also unit test post
and pred
LGTM!
@erikerlandson Thanks!
@erikerlandson if you can give this a quick look I'll merge after your lgtm (thanks!)