rapid7 / recog

Pattern recognition for hosts, services, and content
Other
671 stars 199 forks source link

Fix regex which has exponential degree of ambiguity #609

Closed adammcclenaghan closed 7 months ago

adammcclenaghan commented 7 months ago

Description

The Regex changed suffers from an exponential degree of ambiguity.

As a result it is trivial to provide an input string which takes an incredibly long time to match (due to exponential growth by length of input string).

Related: https://owasp.org/www-community/attacks/Regular_expression_Denial_of_Service_-_ReDoS


Here's an example of how to show this with code.

Tested on Java 8 which exhibits the issue. Tested on Java 11 which did not present the issue, some research online suggests that Java 11 featured some improvements to try to mitigate this. Either way, since this is a collection of patterns which are language agnostic we should probably not support patterns like this.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

/**
 * Tests the runtime of matching a regular expression
 * for varying length "prefix-pump-suffix" strings
 *
 * For example:
 * prefix: hello
 * pump: aa
 * suffix: goodbye
 *
 * We can inject the "pump" N times and view how our regular
 * expression matches with increasing length.
 * 0: hellogoodbye
 * 1: helloaagoodbye
 * 2: helloaaaagoodbye
 * 3: helloaaaaaagoodbye
 *
 * And so on...
 */
class RecogAnalysis {
    private static final int MAX_PUMPS = 16;

    public static void main(String[] args) {
        String regex = "^PolycomRealPresenceGroup(\\d+)/([\\d\\._]+)+";
        String prefix = "PolycomRealPresenceGroup0/..";
        String pump = "..";
        String suffix = "up";

        test(regex, prefix, pump, suffix);
    }

    private static void test(String regex, String prefix, String pump, String suffix) {
        System.out.println("Pumps | Match Runtime (ns)");
        Pattern pattern = Pattern.compile(regex);

        /*
        Pump our 'pump' string into prefix-<pump>-suffix MAX_PUMPS times and see how the
        runtime of the regex match changes
         */
        for (int i = 0; i < MAX_PUMPS; i++) {
            StringBuilder sb = new StringBuilder();

            sb.append(prefix);
            // Pump our 'pump' string i items
            for (int j = 0; j < i; j++) {
                sb.append(pump);
            }
            sb.append(suffix);

            System.out.println("Testing: " + sb);
            Matcher matcher = pattern.matcher(sb);
            long startTime = System.nanoTime();
            matcher.matches();
            long endTime= System.nanoTime();
            System.out.println(String.format("%d %d", i, (endTime - startTime)));
        }
    }
}
Some figures from running locally: Pumps Match Runtime (ns)
0 160747
1 60922
2 291289
3 1053442
4 1651997
5 2451640
6 2574233
7 8502378
8 36571165
9 218745623
10 526267047
11 729248556
12 2694826657
13 10619030573
14 42494969087
15 177124636095

image

Match times are fixed by the replacement regex: Pumps Match Runtime (ns)
0 181492
1 25402
2 22455
3 29395
4 19127
5 20802
6 26910
7 45416
8 42026
9 36306
10 36932
11 79138
12 66055
13 94303
14 66187
15 23724

image

Motivation and Context

I ran this library against all of the patterns within Recog, only one showed an exponential degree of ambiguity so I am fixing it.

How Has This Been Tested?

Functional: Tested with java code shown above to prove out that the new regex does not exhibit this issue.

Regression: The existing example test case for this regex passes

Types of changes

Checklist:

adammcclenaghan commented 7 months ago

Removed the possessive quantifiers because they're not supported in golang

Runtime still fixed with this new regex

Pumps Match Runtime (ns)
0 226687
1 17758
2 17034
3 21194
4 21196
5 23365
6 41325
7 28272
8 32467
9 55776
10 59842
11 59216
12 58549
13 64604
14 34011
15 80653
16 120135

image

Can't run verify hooks locally as ruby won't play nice with my system and I don't have enough time today to fix it.

I did test the new pattern locally against a bunch of patterns. If anything the original regex seems likely too permissive as it matches the likes of PolycomRealPresenceGroup123/__..1.2.3 but since I couldn't find any real banner examples from a quick look online I tried not to change the behaviour of which inputs we match with this change.