TypeIterator for processing Runnable in Thread.start

errt commented 2 weeks ago

Fixes #227

errt commented 2 weeks ago

That is weird behavior indeed. Would be helpful if you could debug why that happens. As a first check, I think it would be interesting what happens in the full RTJar case if you do rewrite invokedynamic. Then again, maybe we should look into whether we can make this make even with invokedynamic not rewritten, since this will probably be a common case for threads and also for other APIs such as doPrivileged.

johannesduesing commented 2 weeks ago

After some more debugging i found what i think is the reason for RTA (and i guess XTA and CFA_1_1) not working when loading the RTJar with interfaces only. The reason is not the "complicated" part of resolving the Thread.start -> Runnable.run relation - that actually works perfectly - it's resolving PrintStream.println.

The typeIterator in org.opalj.tac.fpcs.analyses.cg.CallgraphAnalysis.scala:156 never knows that java/io/PrintStream is instantiateable - which makes sense since its constructor is never called when the RTJar implementations are not loaded. I don't know if there should be some handling for this in place, or if this is desired behavior.

I'll go on looking into the issue when loading the RTJar fully.

errt commented 2 weeks ago

I thought we had things like PrintStream configured in the tac reference.conf, but it seems this is not really the case if the RTJar is not fully loaded. Maybe we need to look into whether and how we should configure more such types in that case. Why does it work without the RTJar present at all, though?

johannesduesing commented 2 weeks ago

  val mResult = if (classHierarchy.isInterface(declType).isYes)
      org.opalj.Result(project.resolveInterfaceMethodReference(declType, call.name, call.descriptor))
  else
      org.opalj.Result(project.resolveMethodReference(
          declType,
          call.name,
          call.descriptor,
          forceLookupInSuperinterfacesOnFailure = true
      ))

  if (mResult.isEmpty) {
      unknownLibraryCall(
          callContext,
          call.name,
          call.descriptor,
          call.declaringClass,
          isStatic = false,
          declType,
          callContext.method.definedMethod.classFile.thisType.packageName,
          pc,
          calleesAndCallers
      )
  } else if (isMethodOverridable(mResult.value).isYesOrUnknown) {
      calleesAndCallers.addIncompleteCallSite(pc)
  }

This piece of code from CallGraphAnalysis.scala is the reason it works without the RTJar present. It triggeres a call to unknownLibraryCall if no declared method can be found for the call, which ultimately adds the println call to the CG. Of course, if the RTJar is loaded (even as iterfaces), mResult will not be empty, and unknownLibraryCall will not be invoked.

errt commented 2 weeks ago

So essentially the situation is this:

The call to the lambda is correctly resolved
When resolving PrintStream.println, the is not resolved for anything but CHA because PrintStream is not considered instantiated
But if the RTJar isn't present, the code above considers it a possibility that there could be a different, unknown, println method and thus adds this kind of fallback call Did I understand that correctly?

If yes, this seems like intended behavior for the most part but we should look into whether further modelling of JDK internals would be sensible in order to make types like PrintStream considered instantiated. I still wonder if it does work with the full RTJar code if invokedynamics are rewritten - does the modelling that we have then cause PrintStream to be considered instantiated?

As a second issue besides potentially better modelling of JDK internals, we might want to look into making these call graph analyses work at least for some cases of invokedynamic.

Do you want to look into these issues?

johannesduesing commented 2 weeks ago

I created #231 and #230 to document these issue, i will look into them and see if i can find a solution. I will also have a look into loading the RTJar with code when dynamics are rewritten and report back here.

You did understand the situation correctly, so i guess everything else is working as intended - i'll approve this PR as soon as i got some insights on loading the full RTJar 👍

johannesduesing commented 2 weeks ago

For these tests, i initialized my project to rewrite invokedynamics by doing this:

    val libFiles = Project.JavaClassFileReader.AllClassFiles(Seq(RTJar))
    val projectFiles = Project.JavaClassFileReader.AllClassFiles(Seq(new java.io.File("Demo.class")))
    val project = Project(projectFiles, libFiles, libraryClassFilesAreInterfacesOnly = false)

I confirmed that the dynamic invocation is in fact rewritten as expected. I found that both CHA and RTA work in this setting (i.e. resolve the lambda call and the call to println), so we do detect that PrintStream is instantiated for RTA.

However, for XTA it again does not find the call to println, regardless of which AI domain is used - it does resolve the lambda invocations correctly though. I honestly did not yet understand what the issue is - i'm having a hard time debugging this because placing conditional breakpoints in the respective methods incurs a huge performance overhead. I'll try some more, but i don't know if i'll find anything today.

I also don't know if CFA_1_1 (or CFA_1_0) work for the full RTJar, my laptop takes way to long computing those CGs... 🥹

errt commented 2 weeks ago

CFA_1_0 at least should work (but maybe not on 'your laptop', it is memory intensive).

If RTA works but XTA doesn't there are two main possible reasons: XTA doesn't properly treat PrintStream as instantiated because its InstantiatedTypesAnalysis works differently or the type PrintStream doesn't reach the type set for the calling method. Both could be checked by inspecting XTA's type sets after running the CG.

If conditional breakpoints incur a large performance penalty, try putting the condition into the actual code, just place an empty println after it and put an unconditional breakpoint on that. That usually doesn't have a noticeable impact on performance.

johannesduesing commented 2 weeks ago

Okay, so here's a few things i found:

XTA does not consider PrintStream to be instantiated when processing the call inside the lambda function, the type is not contained in the respective set
At some point in the JDK, the tac.fpcf.analyses.cg.xta.InstantiatedTypesAnalysis does detect instantiations of PrintStream, for many JDK methods it is in fact contained in the set of instantiated types.
The br.analyses.cg.ConfigurationInstantiatedTypesFinder loads no preconfigured types from the configuration at all - at least for my setup there seem to be no entries for the respective configuration key in the global configuration file.

I didn't yet look into where in the JDK PrintStream is supposed to be initialized, or where RTA found the initialization site, but i think that's what i'll check next.

opalj / opal

TypeIterator for processing Runnable in Thread.start #229