Compile analysis - Githubissues

corruptmemory commented 10 years ago

This PR adds support for fairly robust serialization of Compile Analysis results for clients.

Few things to note:

There is a temporary mechanism for converting sbt.inc._ into protocol equivalents that will likely be revised.
Not everything in sbc.inc.Analysis is carried-over verbatim. For instance, fields that throw exceptions because name-hashing is turned off are Option fields in the protocol versions.
At some point a review of what actually should constitute a compile analysis might be in order. In particular, if folks working on sbt think certain fields are too experimental to be encoded and delivered to clients we should remove/amend those fields.

This is a first step in enabling Play projects to be forked off (they will fork off and use SbtClient to communicate back to the parent)

jsuereth commented 10 years ago

cc> @gkossakowski for his opinions on what we need in the final API.

havocp commented 10 years ago

Commits here don't look super meaningful so maybe we should just squash the whole thing after we review / before merge

corruptmemory commented 10 years ago

RE: commits - agreed. Will squash

gkossakowski commented 10 years ago

I'm not familiar with sbt-remote-control. Why would you like to transfer contents of the whole Analysis object across processes? For example, why would remote client want to look at the details of API objects that are very specific to incremental compiler algorithm?

corruptmemory commented 10 years ago

@gkossakowski at the moment what fields are in the "protocol" version of Analysis is driven by what's in the Analysis and referenced interfaces. Why? Because they are there and a client may want access to that data. For instance, the SbtClient API enables watching arbitrary tasks for changes (for example: recompilation). Sending Analysis across the wire is a natural by-product of saying you want to watch compilation - it produces Analysis as a result. The general principle being that the use of SbtClient ought to be as "data robust" as writing an in-proc plugin.

The more concrete example is that the Play sbt plugin uses analysis results at the moment and we are working to make Play forkable in run. The mechanism is to have the forked process connect back to the parent process (i.e. become a client and use the SbtClient interface) and watch for recompilations. The forked process monitors the file-system (like the plugin does now) and instead of using sbt APIs directly to execute the compilation task on file changes it will send a command to the parent via the SbtClient API. The results of compilation will be propagated back to the child (forked) process where the Analysis or Incomplete (on error) results will be examined as if the plugin were in proc (modulo appropriate abstractions to cover the in/out-of-proc data access).

havocp commented 10 years ago

sbt-remote-control should just export anything that's logically public in the sbt API. Of course, right now a lot of things in sbt are illogically public. So I think the question is what in Analysis could possibly be useful to any sbt plugin, task, IDE, Activator; and what would only be used internally by sbt itself.

gkossakowski commented 10 years ago

The more concrete example is that the Play sbt plugin uses analysis results at the moment and we are working to make Play forkable in run. The mechanism is to have the forked process connect back to the parent process (i.e. become a client and use the SbtClient interface) and watch for recompilations. The forked process monitors the file-system (like the plugin does now) and instead of using sbt APIs directly to execute the compilation task on file changes it will send a command to the parent via the SbtClient API. The results of compilation will be propagated back to the child (forked) process where the Analysis or Incomplete (on error) results will be examined as if the plugin were in proc (modulo appropriate abstractions to cover the in/out-of-proc data access).

Do I understand correctly that you want to run sbt and play plugin in separate processes? This doesn't compute because play plugin is an sbt plugin which means it depends on a ton of sbt's APIs.

Analysis consists of:

stamps
~~apis~~
relations
infos
compilations

I crossed out apis because it comes as first to be excluded from sharing. It contains incremental compiler-internal representation of some aspects of compiled source files. These data structures are optimized for one use case: deciding which files incremental compiler needs to recompile. They are also being used to discover classes with main methods defined but we should have a dedicated API for that. I have half-baked branch for that.

In relations, there are direct and publicInherited that are specific to the old incremental compilation algorithm and are replaced by some new relations when name hashing is enabled. In order to ease migration pain we should avoid sharing those too.

The rest is ok to share.

corruptmemory commented 10 years ago

Do I understand correctly that you want to run sbt and play plugin in separate processes? This doesn't compute because play plugin is an sbt plugin which means it depends on a ton of sbt's APIs.

It can/will remain a "plugin" with parts running in the sbt server and parts running as a client. I do understand that in-practice teasing the pieces apart will have some challenges. Some of those challenges are addressed by the ability to watch compilation results.

Analysis consists of:

stamps

~~apis~~

relations

infos

compilations

I crossed out apis because it comes as first to be excluded from sharing. It contains incremental compiler-internal representation of some aspects of compiled source files. These data structures are optimized for one use case: deciding which files incremental compiler needs to recompile. They are also being used to discover classes with main methods defined but we should have a dedicated API for that. I have half-baked branch for that.

In relations, there are direct and publicInherited that are specific to the old incremental compilation algorithm and are replaced by some new relations when name hashing is enabled. In order to ease migration pain we should avoid sharing those too.

The rest is ok to share.

Cool. At the moment we actually have everything. :-)

havocp commented 10 years ago

Jim it sounds like we should do a pass through here removing unnecessary stuff from the protocol version of analysis. Thanks Grzegorz. (the background of sbt remote control is that it's a feature to be merged into sbt itself, which allows any number of UIs to use the same instance of sbt because running multiple instances at once on the same project is broken. UIs include the play http server in dev mode, the interactive sbt prompt, IDEs, and activator.)

jsuereth commented 10 years ago

@corruptmemory Would you mind rebasing against master, this is no longer mergeable.

Additionally, I feel ike some of this may need to be split into separate PRs, kinda hard to review. I think i identified three components:

Some kind of DynamicSerialization stuff (maybe we can investigate using picklers here)
Serializing CompileAnalysis (I need to talk with @gkossakowski and try to reduce what we return to the bare minimum needed by play, and refine for the IDE when we integrate with them.
final class cleanups (can be a separate commit/PR, easier to merge)

corruptmemory commented 10 years ago

I am going to rebase on master tonight (can't right now), squash, and finish up the serialization tests.

RE: reducing Analysis - don't get too involved in that yet. The Play plugin is using a surprising amount of the Analysis result including the supposed "internal" APIs values. For now I'm leaving things as they are. As we make progress on the Play fork run I'll have some better data on what can be jettisoned from the serialized version.

For clarification though: the results of tasks should contain values that in principle are potentially or actually useful to some client. There may be nicer/saner ways to encode the information in Analysis (or other sbt task results), but I would caution on being overzealous.

havocp commented 10 years ago

I split out https://github.com/sbt/sbt-remote-control/pull/191 and maybe got a few more of them (not sure)

corruptmemory commented 10 years ago

I rebased and squashed. I've not finished work on the protocol test (as you can see some work has been done there). In the mean-time feel free to review the changes.

havocp commented 10 years ago

cool. instead of commenting I'm just going to submit patches I think and try to get it done. do we want to "lazy all the formatters"? that is another thing like final that would be nice to split out of here. In general I think we only want lazy if we expect a lot of the individual val to be unused; the object the vals are inside is lazy anyway, so the question is will we use some of these but not most of them.

corruptmemory commented 10 years ago

Lazy on the formatters was done to avoid NPEs and endless moving around of vals.

RE: patches - go for it!

gkossakowski commented 10 years ago

I still fail to see how you can split sbt and Play plugin into two processes. Do you guys have a design doc which explain this somewhere?

For clarification though: the results of tasks should contain values that in principle are potentially or actually useful to some client. There may be nicer/saner ways to encode the information in Analysis (or other sbt task results), but I would caution on being overzealous.

I apologize if I sounded overzealous. My advice is coming from the experience working with incremental compiler.

corruptmemory commented 10 years ago

@gkossakowski, no there is no design doc. There is simply an informal analysis of what the current plugin is doing. Generally speaking, the flow is anticipated to work like this:

Forked process:

Connects back to parent
Sets up a watch on the compile task
Sends a compile request to the parent
In the watch body examine the results of the compilation 4a (success) - build the appropriate classpath information and (re)load the Play application 4b (failure) - examine the incomplete information and build an error page
Monitors file system for relevant changes
When relevant changes are detected go to step 3

The parent (server) can run additional tasks in addition to the compile task and have "native" in-proc access to sbt's API. The task(s) actually executed by the client can be compile in conjunction with other tasks. sbt-server and sbt-client will return all of the task outputs to the client in one consistent "batch" (transaction) so if we need to get more information than Analysis then we can request multiple tasks be run as part of one execution request.

I'm omitting a lot of detail, but at least on initial inspection this approach looks like it will work.

havocp commented 10 years ago

We have a lot of text somewhere on sbt-remote-control and sbt server, but I don't know if we have an easy to consume doc that just answers your question. One process is the http server (in dev mode) that listens for http requests and triggers recompilation. That process uses this API: https://github.com/sbt/sbt-remote-control/blob/master/client/src/main/scala/sbt/client/SbtClient.scala (activator and Eclipse also use that API). The other process is sbt itself (the "sbt server"), which still has a play plugin in it which can do anything we want to do in sbt itself, including export custom tasks purely for the use of the http server process. In principle, the http server process could be refactored to have very little smarts and even pull the html to display from the sbt process. However I think the natural current "breakpoint" that the codebase is set up for now keeps more smarts in the http server.

We have to keep in mind though that sbt-remote-control is not just for the play plugin, it's for Eclipse and Activator and eventually the interactive prompt as well. In general the goal is to expose the full sbt task engine (all tasks and their results) in a generic way. For example Activator and Eclipse might also want to display the results of a compilation. SbtClient is the connection between clients (clients = Activator, Eclipse, the interactive command prompt, and the play http server) and the sbt task engine, so what we have to communicate over the protocol is anything a user might want to see or do, or that might be needed to compute something a user might want to see or do.

This is all working already more or less, except for Play which isn't broken into two processes yet, so you can always look at the code here in this project.

havocp commented 10 years ago

@corruptmemory I'm trying to shrink this PR down so it's only the new serializers and converters. https://github.com/sbt/sbt-remote-control/pull/192 is another PR to add a conversions facility. I want to keep this separate from DynamicSerialization because DynamicSerialization is relevant on the client side but DynamicConversion is not. Figured it would be easier to code this than explain it.

I will port this PR over to the new stuff. Going to look for any other refactors to split out, first.

corruptmemory commented 10 years ago

No problem. Ignore my comment on #192 then. Thanks

gkossakowski commented 10 years ago

Thanks guys for the background info. I'd suggest the approach of exposing more and more of incremental compiler details driven by specific use cases coming from Play, Activator or IDE.

If you consider Play, I found two places that do depend on Analysis:

https://github.com/playframework/playframework/blob/a38d8d126ad137afaac127400f3ff4df32fc11cc/framework/src/sbt-plugin/src/main/scala/PlayCommands.scala#L94 https://github.com/playframework/playframework/blob/02e05755753407bc238da6f57f551fa455d820cb/framework/src/sbt-plugin/src/main/scala/PlayReloader.scala#L140

The first use of Analysis is implementation of bytecode enhancement capability that has to update Analysis so it's in sync with class files on disk. I doubt you want to expose such functionality through remote interface. Moreover, a hook for bytecode enhancement should be implemented by sbt so Play doesn't need to mess with internals of sbt. See sbt/sbt#1362 for initial work going in that direction.

The second use is less clear to me but it appears it's about generating error pages we see in play when compilation fails. The apis is being used to find a source file for given class name. There's a dedicated api that play should use: Relations.definesClass

See the sbt/sbt#1268 for in-depth description of issues related to Analysis.apis.

corruptmemory commented 10 years ago

@gkossakowski, this is very helpful. I'll read up on the issues you pointed out.

corruptmemory commented 10 years ago

@gkossakowski So, I looked (again) at those places in the Play plugin you are referring to. The second one looks safe to process via the client and the first one looks like it stays in the server.

jroper commented 10 years ago

I would have thought that the the aspects of the Play plugin that require Analysis can run in process, while the forked process doesn't need any dependency on Analysis. The Play plugin would watch the filesystem from in the sbt server process, and send events to the forked process to tell it when it needs to reload. Then, when the forked Play process gets a new request, if it's received one of these reload required events, it sends a request back to sbt-server, the play plugin handles it, maybe recompiles, and then sends the result back (an error, a status to say nothing needs to be changed, or some description of the new classpath).

As for implementing the findSource method, again, this would run in process, and the remote server would call back to SBT to execute that.

Play has an interface called BuildLink, this is what the Play dev server uses to communicate with SBT. From a types point of view, it only uses very basic (mostly primitive) types. It was created to allow SBT and the Play dev server to use different versions of Scala, but I think it also works here as a perfect remote interface, so the Play server will use an implementation that is a remote interface (backed by SBT client), and the play plugin will expose SBT tasks that the remote interface implementation can invoke through the SBT client, and these tasks will simply delegate to a local implementation of BuildLink, as currently defined in PlayReloader.

So for example, the most complex method, reload, would look something like this (note for simplicity caching the current running application in a global static var, in the final implementation it should be attached to the SBT state somehow:

val playReloadTask = TaskKey[PlayReloadResult]("play-reload-task")
val playRunTask = TaskKey[Unit]("play-run-task")

object PlayCurrentBuildLink {
  var current: Option[BuildLink] = None
}

playRunTask := {
  backgroundJobService.start {
    val buildLink = ... // build link created, process forked, file monitoring threads created etc here like Play currently does
    PlayCurrentBuildLink.current = Some(buildLink)
  }
}
playReloadTask := {
  PlayCurrentBuildLink.current match {
    case Some(buildLink) => buildLink.reload()
    case None => throw new RuntimeException("No current running application to reload")
  }
}

So that all runs on the server side in process, meanwhile in the forked Play process:

class RemoteBuildLink extends BuildLink {
  var fileChanged: Boolean = false // this will be set by a listen event sent back from the server
  def reload() = {
    if (fileChanged)
      val result = sbtClient.runTask("play-reload-task")
      // do whatever needs to be done to transform the result into what BuildLink expects...
  }
}

corruptmemory commented 10 years ago

That's more-or-less what I expect to happen.

corruptmemory commented 10 years ago

Pretty sure we can close the PR. Any objections?

havocp commented 10 years ago

nope

sbt / sbt-remote-control

Compile analysis #190