Document caching in AnnotatedTypeFactory.getAnnotatedType(Tree)

msridhar commented 5 years ago

This seems to have been covered in some way in previous issues (#130, #498, #601), but I can't quite grok the overall current state of caching in AnnotatedTypeFactory. I saw in a profile of a checker we are building that a significant amount of running time is spent in AnnotatedTypeFactory.getAnnotatedType(Tree):

Further, while debugging I noticed some quadratic behavior for deeply-nested method call expressions, which we deal with often. E.g., if we have this code:

x.setProp1(...).setProp2(...).setProp3(...).setProp4(...)

Invoking getAnnotatedType for each call expression results in calling getAnnotatedType for each of the nested sub-expressions, and the results aren't cached.

It seems from previous discussions that more aggressive caching is quite subtle in terms of correctness. It might be worth having documentation somewhere on the current state of caching and why more aggressive caching is tricky to get right. Also I am wondering if a checker could opt in to more aggressive caching if the trickiness is checker-dependent. Finally, do some checkers add their own more aggressive caching, overriding the framework's default behavior?

smillst commented 5 years ago

Here's some documentation on the current state of caching that we should add somewhere:

Definitions:

fully annotated type: an annotated type mirror that has annotation for each hierarchy and all the AnnotatedTypeMirrors referenced by it also have an annotation for each hierarchy. (Note, uses of type variables are not required to have an annotation.)
partially annotated type: an annotated type mirror that does not have an annotation for each hierarchy.

AnnotatedTypeFactory AnnotatedTypeMirror caches:

the fully annotated types of method, constructor, class declarations
partially annotated types of fields and method declarations where the only annotations are those explicitly written in source or bytecode
partially annotated types corresponding to TypeTrees where the only annotations are those explicitly written in source code.
partially annotated types corresponding to ExpressionTrees where annotations added by addComputedTypeAnnotations are not stored, but all other annotations are.

So calling AnnotatedTypeFactory.getAnnotatedType(Tree) on an ExpressionTree always calls addComputedTypeAnnotations on the type corresponding to the tree. If addComputedTypeAnnotations calls getAnnotatedType on any expressions, their types are (partially) recomputed as well.

So, why not just cache the fully annotated type of expressions? Because their types can change during dataflow. In fact, the current caching of expressions is buggy (see #602). In general, we can cache the fully annotated type, so long as we clear the types if they are changed by dataflow. The "if they are changed" is the tricky part. If this is done too often, then the cache won't be helpful.

So, I don't see a way for a particular checker to cache more than the framework does and as far as I know checkers don't have their own caches. In some cases, checker do implement typing rules as part of their dataflow analysis and so that the store acts as a cache. This won't work in your case because the invoked methods are not side-effect-free so the store is cleared after each call. (Though you might be able to work around that.)

msridhar commented 5 years ago

This is great @smillst thanks so much! It’s really helpful.

typetools / checker-framework

Document caching in AnnotatedTypeFactory.getAnnotatedType(Tree) #2647