typetools / checker-framework

Pluggable type-checking for Java
http://checkerframework.org/
Other
1.02k stars 353 forks source link

Utilize `RegexChecker`'s `group.count.invalid` check in `NullnessChecker` #4006

Open Stephan202 opened 3 years ago

Stephan202 commented 3 years ago

Consider the following dummy code:

$ cat -n Dummy.java 
     1  import java.util.regex.Matcher;
     2  import java.util.regex.Pattern;
     3  import org.checkerframework.checker.nullness.qual.NonNull;
     4  
     5  final class Dummy {
     6    private static final Pattern PATTERN = Pattern.compile("^.(.*).$");
     7  
     8    String doSomethingOdd(String input) {
     9      Matcher matcher = PATTERN.matcher(input);
    10      if (!matcher.matches()) {
    11        return "";
    12      }
    13  
    14      @NonNull String a = matcher.group(1);
    15      @NonNull String b = matcher.group(2);
    16      return a + b;
    17    }
    18  }

Compiling this code using both the NullnessChecker and RegexChecker we get the following output:

$ javac \
    -J--add-opens=jdk.compiler/com.sun.tools.javac.comp=ALL-UNNAMED \
    -processorpath ~/.m2/repository/org/checkerframework/checker/3.8.0/checker-3.8.0.jar \
    -cp ~/.m2/repository/org/checkerframework/checker-qual
/3.8.0/checker-qual-3.8.0.jar \
    -Awarns \
    -processor org.checkerframework.checker.nullness.NullnessChecker,org.checkerframework.checker.regex.RegexChecker \
    Dummy.java
Dummy.java:14: warning: [assignment.type.incompatible] incompatible types in assignment.
    @NonNull String a = matcher.group(1);
                                     ^
  found   : @Initialized @Nullable String
  required: @UnknownInitialization @NonNull String
Dummy.java:15: warning: [assignment.type.incompatible] incompatible types in assignment.
    @NonNull String b = matcher.group(2);
                                     ^
  found   : @Initialized @Nullable String
  required: @UnknownInitialization @NonNull String
Dummy.java:15: warning: [group.count.invalid] invalid groups parameter 2. Only 1 groups are guaranteed to exist for matcher.
    @NonNull String b = matcher.group(2);
                                      ^
3 warnings

As can be seen, the RegexChecker is smart enough to realize that matcher.group(1) is safe, while matcher.group(2) is not. On the other hand, the NullnessChecker flags also matcher.group(1) as problematic, while this call will not yield null.

Which leads me to the feature request: would be be possible to "somehow" share the RegexChecker analysis such that the NullnessChecker does not produce this false positive?

mernst commented 3 years ago

This is a reasonable request.

The implementation is not terribly difficult. Make the Nullness Checker run the Regex Checker, as a subchecker, to infer and verify types for local variables such as matcher. This behavior would be enabled by a command-line option.

UtR491 commented 3 years ago

I have come up with this. Currently it works only for the given case. Are there other cases where the Regex Checker analysis can be useful for the Nullness Checker? I did not write any tests because I was not sure if this is the only case to be handled. I will open a PR once I have written the tests and the CI builds pass. Following is the Nullness Checker analysis for the Dummy class with and without the -AenableRegexChecker flag.

utkarsh@hp:~/checker-framework$ which javac
/home/utkarsh/checker-framework/checker/bin-devel/javac
utkarsh@hp:~/checker-framework$ cat -n Dummy.java 
     1  import java.util.regex.Matcher;
     2  import java.util.regex.Pattern;
     3  import org.checkerframework.checker.nullness.qual.NonNull;
     4  
     5  final class Dummy {
     6      private static final Pattern PATTERN = Pattern.compile("^.(.*).$");
     7  
     8      String doSomethingOdd(String input) {
     9          Matcher matcher = PATTERN.matcher(input);
    10          if (!matcher.matches()) {
    11              return "";
    12          }
    13  
    14          @NonNull String a = matcher.group(1);
    15          @NonNull String b = matcher.group(2);
    16          return a + b;
    17      }
    18  }
utkarsh@hp:~/checker-framework$ javac -processor nullness Dummy.java 
Dummy.java:14: error: [assignment.type.incompatible] incompatible types in assignment.
        @NonNull String a = matcher.group(1);
                                         ^
  found   : @Initialized @Nullable String
  required: @UnknownInitialization @NonNull String
Dummy.java:15: error: [assignment.type.incompatible] incompatible types in assignment.
        @NonNull String b = matcher.group(2);
                                         ^
  found   : @Initialized @Nullable String
  required: @UnknownInitialization @NonNull String
2 errors
utkarsh@hp:~/checker-framework$ javac -processor nullness -AenableRegexChecker Dummy.java 
Dummy.java:15: error: [assignment.type.incompatible] incompatible types in assignment.
        @NonNull String b = matcher.group(2);
                                         ^
  found   : @Initialized @Nullable String
  required: @UnknownInitialization @NonNull String
Dummy.java:15: error: [group.count.invalid] invalid groups parameter 2. Only 1 groups are guaranteed to exist for matcher.
        @NonNull String b = matcher.group(2);
                                          ^
2 errors
UtR491 commented 3 years ago

Hi @mernst, I have created a PR (#4199) and the CI builds pass. A review would be appreciated.

FWiesner commented 3 years ago

really cool it would be if the same would also work for named groups

mernst commented 3 years ago

This optimization requires a richer analysis than that done for the current @Regex annotation.

Matcher.group(int) can return null even if its argument is legal (the argument is legal if it is no more than the group count). The Match.group(int) documentation explains this:

If the match was successful but the group specified failed to match any part of the input sequence, then null is returned.

This optimization can be applied, using a richer annotation that lists all the capturing groups that are definitely non-null. (Currently, @Regex(4) says that there are 4 capturing groups, but it does not say which ones of those are definitely non-null if the group matches.)

Currently, the Regex Checker can infer the @Regex annotation when code uses the isRegex(String s, int groups) method. There would need to be a new isRegex method, corresponding to the new richer annotation. The new isRegex method would take a list of group numbers where the current method takes a single number.