rapid7 / recog

Pattern recognition for hosts, services, and content
Other
662 stars 195 forks source link

Simplify a ton of regexps #332

Closed jvoisin closed 2 years ago

jvoisin commented 3 years ago
tsellers-r7 commented 3 years ago

CCing @hdm in case you have any concerns with these changes when used in your projects.

tsellers-r7 commented 3 years ago

@jvoisin - Thanks for the contribution. In general I'm on board with this. I'm going to let a few others weigh in this to ensure that I'm not overlooking something. Massive PRs like this are the reason why we don't allow new fingerprints that don't provide examples to test against. It makes me feel so much better when every fingerprint is tested against changes.

Also, as a note. There are changes to the newline at the end of these files. This may be due to how you have line ending set (windows vs unix) and/or you may need to run ruby bin/recog_cleanup.

hdm commented 3 years ago

I am onboard with this and am really glad the test coverage is great =D

dabdine-r7 commented 3 years ago

I believe this will cause many regressions for anything using recog-java (https://github.com/rapid7/recog-java), since it uses java.util.regex.Pattern, and Java's Pattern requires anchoring...

$ cat Test.java

import java.util.regex.Pattern;

public class Test {
    public static void main(String[] args) {
        System.out.println(Pattern.matches("(iSeries)", "IBM i5/OS iSeries (OS/400)"));
        System.out.println(Pattern.matches(".*(iSeries).*", "IBM i5/OS iSeries (OS/400)"));
    }
}

$ java -cp . Test
false
true
jvoisin commented 3 years ago

Then it has been broken for a long time, because a lot of patterns don't have anchoring :/

dabdine-r7 commented 3 years ago

@tsellers-r7 can we run tests for recog-java on these updated FPs? I'm slammed at the moment, can get around to it in a bit if you can't.

gschneider-r7 commented 3 years ago

It's a pain to test recog-java against branches/PRs, but the tests are only run against fingerprints with examples anyway. So as-is the tests "pass" against this change, but adding an example to one of the modified fingerprints does result in a failure.

If this is a desirable change for recog then there needs to be coordination with recog-java and nexpose to handle the change by adding anchors at runtime or something. So I would hold off on merging this PR until that is sorted out.

I would guess that the existing patterns without anchors are not used by recog-java/nexpose today, or they don't work as expected when used. :disappointed:

tsellers-r7 commented 3 years ago

Thanks for the feedback folks, this is what I was concerned about. There is nothing driving this PR other than just cleaning up the regex a bit. If it's going to break recog-java and/or product then we can cancel it or put it on hold.

I would guess that the existing patterns without anchors are not used by recog-java/nexpose today, or they don't work as expected when used.

This part concerns me. Is there is anything that can be done to improved the testing situation with recog-java?

gschneider-r7 commented 3 years ago

Is there is anything that can be done to improved the testing situation with recog-java?

One thing I've wanted to do is automate recog-java's integration tests against every release of recog, but I don't know how that could be easily triggered or how a failure would notify out to maintainers here. It would probably be easy to have this repo checkout recog-java to run tests. At that point it might as well run on PRs as well. That feels like crossing a weird boundary, but I haven't thought of anything better yet. We might want some meta-recog project to test the entire ecosystem or something. :snail:

That doesn't help for the lack of examples for many fingerprints, though, which is how the tests are currently driven.

How do other consumers/implementations of recog handle it?

hdm commented 3 years ago

Ruby is covered by the main project. For Go, we have a similar implementation of recog_verify that helps sanity check our implementation vs the main project. We also do some sanity checks on the Rumble side with the Recog databases at build time. Recog-GO doesn't do CI yet for this, but that should be easy enough.

dabdine-r7 commented 3 years ago

We should start requiring <example> tags on new PRs for validation of fingerprints. In terms of testing, my knee-jerk thought is to just use submodules to perform testing of the other language support when this project is built or new submissions are made. However, not sure how "fun" that will be -- getting the CI platform configured to run ruby/go/java etc.

ghost commented 3 years ago

@gschneider-r7 @dabdine-r7 One simple (though I don't know how precise) technique to deal with this would be using find() vs matches() in the pattern API for recog-java. Though again we need to rely on coverage to test if this has unintended consequences.

tsellers-r7 commented 3 years ago

@dabdine - We don't have a technical requirement for examples for new fingerprints but I won't knowingly land a new fingerprint w/o an example.

hdm commented 3 years ago

This has been merged into github.com/RumbleDiscovery/recog (main).

tsellers-r7 commented 3 years ago

Status update: We're currently looking at implementing tests for the Ruby, Java, and Go libs. Once we've sorted that out we can revisit this PR.

CC @mkienow-r7

mkienow-r7 commented 2 years ago

@jvoisin With the recently added GitHub Actions you shouldn't see any future PR review delays due to cross language concerns. We appreciate the contribution and it is unfortunate this PR was left open for so long. The merge conflicts are a bit messy since subsequent PRs removed unnecessary capture groups (e.g. #351). I think we have two options to simplify moving forward with these changes:

  1. We close this PR and repeat these changes from the latest on your behalf. They won't be your commits anymore, but you'll get a shout-out in the new PR.
  2. We close this PR, you repeat the process from the latest, and open another PR.

Please let me know how you would like to proceed.

hdm commented 2 years ago

One option may be to PR the Rumble branch, since we merged this PR a while back and managed the conflicts since. The upside is it is a clean merge and maintains the Git history. The wrinkle is that is includes our other work too: https://github.com/rapid7/recog/compare/master...RumbleDiscovery:main

jvoisin commented 2 years ago

Feel free to go with 1. :)

mkienow-r7 commented 2 years ago

@hdm I appreciate the suggestion. I'm going to recreate these changes on our latest, but I'm interested in taking a closer look at merging your updates. That will be next!

@jvoisin Thank you for the confirmation! :)