petar-dambovaliev / aho-corasick

efficient string matching in Golang via the aho-corasick algorithm.
MIT License
68 stars 11 forks source link

Fix overlapping patterns #13

Closed semvis123 closed 1 year ago

semvis123 commented 1 year ago

Fixes #12 the findIter position was not correctly updated after a match was found.

petar-dambovaliev commented 1 year ago

Fixes #12 the findIter position was not correctly updated after a match was found.

Hey, thanks for your contribution. LGTM.

semvis123 commented 1 year ago

I had marked this as a draft, because I missed a specific case. (see https://github.com/trufflesecurity/aho-corasick/pull/1 ) So even though it works better than it did before, this isn't a perfect fix.

petar-dambovaliev commented 1 year ago

@semvis123

Do you wanna open another PR to improve it additionally? Otherwise, i can revert this.

semvis123 commented 1 year ago

Yeah I can do that, once I have a better fix I will open a new pull request.

semvis123 commented 1 year ago

Seems like the overlappingIter is working correctly. It is just not used by the findAll + standardMatch combination. For more information see: https://github.com/trufflesecurity/aho-corasick/pull/1

I did however notice that my fix caused the LeftMostLongestMatch matchKind to return multiple matches, on overlapping patterns (instead of the left most longest match). I fixed this by reverting this fix for normal matches, and only using this new calculation for matches that are being cancelled due to MatchOnlyWholeWords. I have opened a new pull request for this.