uber / NullAway

A tool to help eliminate NullPointerExceptions (NPEs) in your Java code with low build-time overhead
MIT License
3.62k stars 290 forks source link

NPE when updating to checker-framework 3.23.0 #623

Closed ZacSweers closed 2 years ago

ZacSweers commented 2 years ago

When updating checker-framework (checker-qual) to 3.23.0, we see this NPE come from nullaway


/mnt/buildkite-agent/builds/mbr-buildkite-agent-android-iad-lm4j-1/slack/android-pull-request/libraries/foundation/rx/paging2-rx3/src/main/java/slack/rx/paging3/ArchTaskExecutor.java:39: error: An unhandled exception was thrown by the Error Prone static analysis plugin.
--
  | mDelegate = mDefaultTaskExecutor;
  | ^
  | Please report this at https://github.com/google/error-prone/issues/new and include the following:
  |  
  | error-prone version: 2.14.0
  | BugPattern: NullAway
  | Stack Trace:
  | java.lang.NullPointerException: castToNonNull failed!
  | at com.uber.nullaway.NullabilityUtil.castToNonNull(NullabilityUtil.java:352)
  | at com.uber.nullaway.dataflow.AccessPathNullnessPropagation$1.valueOfSubNode(AccessPathNullnessPropagation.java:179)
  | at com.uber.nullaway.dataflow.AccessPathNullnessPropagation.visitAssignment(AccessPathNullnessPropagation.java:516)
  | at com.uber.nullaway.dataflow.AccessPathNullnessPropagation.visitAssignment(AccessPathNullnessPropagation.java:134)
  | at org.checkerframework.nullaway.dataflow.cfg.node.AssignmentNode.accept(AssignmentNode.java:65)
  | at org.checkerframework.nullaway.dataflow.analysis.AbstractAnalysis.callTransferFunction(AbstractAnalysis.java:349)
  | at org.checkerframework.nullaway.dataflow.analysis.ForwardAnalysisImpl.callTransferFunction(ForwardAnalysisImpl.java:372)
  | at org.checkerframework.nullaway.dataflow.analysis.ForwardAnalysisImpl.runAnalysisFor(ForwardAnalysisImpl.java:272)
  | at org.checkerframework.nullaway.dataflow.analysis.AnalysisResult.runAnalysisFor(AnalysisResult.java:448)
  | at org.checkerframework.nullaway.dataflow.analysis.AnalysisResult.runAnalysisFor(AnalysisResult.java:416)
  | at org.checkerframework.nullaway.dataflow.analysis.AnalysisResult.getStoreBefore(AnalysisResult.java:296)
  | at org.checkerframework.nullaway.dataflow.analysis.AnalysisResult.getStoreBefore(AnalysisResult.java:279)

where the source for that failure is the penultimate line in this source

final class ArchTaskExecutor extends TaskExecutor {
  @LazyInit private static volatile ArchTaskExecutor sInstance;

  private TaskExecutor mDelegate;

  private final TaskExecutor mDefaultTaskExecutor;

  private static final Executor sMainThreadExecutor =
      command -> getInstance().postToMainThread(command);

  private static final Executor sIOThreadExecutor =
      command -> getInstance().executeOnDiskIO(command);

  private ArchTaskExecutor() {
    mDefaultTaskExecutor = new DefaultTaskExecutor();
    mDelegate = mDefaultTaskExecutor; // <--- this line
  }
msridhar commented 2 years ago

Thanks, @ZacSweers. We haven't tested with the latest CF release yet. I'm mostly on holiday until August 1 so I won't be able to personally look until then. In the meantime, a workaround may be to keep just the Checker Framework dataflow-nullaway artifact at 3.22.2. Having multiple CF versions floating around can sometimes cause problems, but since dataflow-nullaway is shadowed just for NullAway, it should be fine 🤞

lazaroclapp commented 2 years ago

I can definitely repro this by bumping CF on NullAway's master branch. Will take a look at the root cause, but, either way, until the next NullAway release is out, forcing dataflow-nullaway to be 3.22.2 should be both necessary and sufficient to work around this.

Edit: Fun fact, quick delta debugging shows that this is being caused by this CF commit specifically causing input.getValueOfSubNode(node) here to sometimes return null, which were asserting was non-null (hey, good example of the risks of suppressing NullAway, even when we think we understand the object state preconditions!). Adding this for context in case I get dragged to something else before I have the time to debug this further, but please feel free to ignore this discussion until you are back from vacation, Manu. I'll keep digging if I can.

Basically, the issue is here, somehow in ends up being an UnmodifiableIdentityHashMap.java and this.nodeValues its underlying IdentityHashMap. So, when the code calls this.nodeValues.clear() it also clears the contents of in. One way to fix this is to change UnmodifiableIdentityHashMap.wrap(...) to make a copy if the map being wrapped isn't already immutable. I wonder if that will simply erase the advantages of that optimization, but the only other option is to impose the contract that, once an IdentityHashMap has been wrapped through an immutable reference, the original map can't be used, and this seems hard to enforce in a reliable manner.

msridhar commented 2 years ago

Thanks for digging, @lazaroclapp. That change to introduce UnmodifiableIdentityHashMap was an optimization, but I didn't realize there is a way for the underlying IdentityHashMap to still be modified (all the Checker Framework tests passed). I'll dig into this more when I get some time.

Do the NullAway regression tests fail with this change? Or just @ZacSweers's example? Edit: the NullAway tests also fail. I should have really run them before landing that change on Checker Framework...

msridhar commented 2 years ago

This is now fixed upstream in Checker Framework. Once the next Checker Framework release comes out (early August most likely), the problem should be fixed. To try to avoid getting surprised like this in the future I'll try to set up a daily CI cron job that tests NullAway against the latest Checker Framework main branch.

msridhar commented 2 years ago

FYI CF 3.24.0 is out now and this issue should not appear when updating to that version