Provide the same IDE experince when project can't be compiled

dos65 commented 1 year ago

Is your feature request related to a problem? Please describe.

There is known issue related to difference in experience when you work with project that successfully compiled errors vs when there are some compilation errors. Most of Metals feature rely on having .class and semanticdb directories. So if compilation fails they can't be updated and we unable to provide the actual state of sources. This affects all important features like completions, hover, go to definition and many others.

Examples that I would like to see working are:

it will be possible to clone a fresh project with compilation error but able to work with it immediately, instead of manually suppressing errors (commenting code or putting ???)

it will be possible break the code but continue entering new symbols and they will be visible for PC:

// initial st
/a/src/main/scala/A.scala
object A {
def foo: Int = 42
}
/a/src/main/scala/B.scala 
object B {
 val x = A.foo
}

// introduce an error
/a/src/main/scala/A.scala
object A {
def foo: Int = "sadsad"  // error
}
/a/src/main/scala/B.scala 
object B {
 val x = A.foo
}

// lets' say you are doing dome refactoring and decided that 
// to fix  `B.x` you need a new object C
/a/src/main/scala/A.scala
object A {
def foo: Int = "sadsad" // error
}
/a/src/main/scala/C.scala 
object C {
 def newFoo: Int = ???
}
/a/src/main/scala/B.scala 
object B {
 val x = C.@@ // <- here we should get completions `newFoo` 
}

the same example above should also support the case when there are several build targets and the error is in upstream one

Describe the solution you'd like

I've already did some investigation and PoC for fixing the thing I described. However, now I don't have enough time to complete it. @tgodzik asked me to share the existing state.

So, in general the idea is that we have to compile as much as possible and produce signatures instead of classfiles + semanticdb to a special directory. It was never a case for existing scala-tooling but it probably it's the only option that might fit the existing Metals architecture.

Here I will describe how the solution of this might work and share my PoC branches.

1) New flag to compiler - produce signatures and cleanup errors from tress where it's possible

Scala 3 PoC changes - https://github.com/lampepfl/dotty/commit/428256d42d23eea1901ddea8d3526300f9288c30 For Scala3 I added -Yinteractive-trees-path compiler flag which:
- forces the compiler to continue phases until pickler ignoring errors
- at pickler phase cleanup trees from errors
- writes these trees into .tasty files in a special directory (value of -Yinteractive-trees-path)
Also, it allows to load classpath only from .tasty files. Right now, compiler loads them only if there is .class file.
Scala 2 Before experimenting with scala 3 For I did a similar thing for Scala 2 but using compiler-plugin - https://github.com/dos65/metals/tree/sigcompiler. There I was using .sig files instead of tasty but overall the approach was similar to above one.

2) Bsp - a special directory for signatures + continue compilation using signatures dir

Here is a commit that adds --interactive flag for CompileRequest in bloop. It allows, to continue compilation even is upstream project has errors. Also, in case of Metals, bloop adds signatures which will be passed as -Y-interactive-trees-path to compiler. Changes into build-server-protocol - https://github.com/build-server-protocol/build-server-protocol/commit/de19a057dea508f68e04912688264cb5d69d39c6

3) Last part - changes in Metals

Metals should be adapted to use this signaturesDir directory instead of outputDirectory as classpath for PC. Also, it should reload PC every time compilation completes even in case of having errors. I haven't completed PoC for metals but here is my wip changes - https://github.com/dos65/metals/commit/376f890b0aaac5115697d225869fd59e22ff7dfb

I also, should notice, that reloading PC every time might cause performance issues and probably this approach would require some incremental PC reloading. However, for this thing I have no ideas how it might be done and which changes will be required (probably something inside compiler for symbol table reloading)

Describe alternatives you've considered

As some of us might remember, in some issues there were mentions of -Youtline compiler option and that this one might solve this issue. I've tried it but doesn't fully fit for fixing this issue because compiler won't be forced to dump .sig files if there were error at parser stage + it only ignores blocks, and bodies.

// signature ok - `A` + `A.foo`  will be provided
class A {
   def foo: Int = "asdsada" // error
}

// no signatures - typer error
class B extends Unknown { // error: not found Unknown
   def foo: Int = ???
}

Additional context

-

Search terms

compiler, tasty, semanticdb, bsp, presentation compiler

odersky commented 1 year ago

I am not sure what signatures are, and what they are needed for. For the use cases considered, we should first evaluate whether the existing tools are not sufficient. if I understand correctly, Metals uses semanticdb for hyperlinking and the presentation compiler for completion. Here, the presentation compiler is just the normal compiler invoked through a specific API. Correct so far?

Now, if we look at a program with errors, I see a problem with hyperlinking. It seems semanticdb extraction in dotty requires a compiling program to work. But this can be easily changed, as far as I can see /cc @bishabosha for confirmation.

What about completion and the PC? It turns out that the compiler will load files from source as long as there is no class or tasty file, or the source is newer than the class file, which would be the case when formerly compiling sources get edited and the editing introduces errors. So the behavior of the compiler already takes the latest version of the source info account.

In summary it looks to me that we just have to enable semanticdb generation for sources with errors and we are set. Or am I missing something?

dos65 commented 1 year ago

@odersky

Here, the presentation compiler is just the normal compiler invoked through a specific API. Correct so far What about completion and the PC? It turns out that the compiler will load files from source as long as there is no class or tasty file, or the source is newer than the class file, which would be the case when formerly compiling sources get edited and the editing introduces errors. So the behavior of the compiler already takes the latest version of the source info account.

PC works a bit different. It doesn't works directly with all sources of project, it operates only with current one. Also, there is a difference between PC and normal compiler in classpath.

When we start PC in Metals and ask for completion, we don't not compile all sources of project, we do typecheck only for the current file which we edit. PC has the current target directory of BSP server for this project as an extra classpath item. When we save file in editor, Metals triggers BSP compilation, and we refresh PC if complation pass.

This leads to situation, that we won't see new symbols from other files until the compilation on BSP side won't pass.

Example:

// --- existing project ---
// a/src/main/scala/A,scala
object A:
  val foo: Int = 42
// a/src/main/scala/B,scala
object B

// --- ask for completion ---
// a/src/main/scala/B,scala
object B:
  val bar = A.@@

// Metals triggers BSP compilation for `a`
// Metals starts PC where classpath is:
  [  scala-library_*.jar,  // <- dependencies
    .bloop/a/classes-Metals-****/ // <- extra-item - current project classpath
  ]
// PC typechecks only content of `B.scala` and evaluates completions
// The completion result `A.foo` will be provided only if `.bloop/a/classes-Metals-****`  has `A.class`  file

// -- enter an error and save A + add new class -- 
// a/src/main/scala/A,scala
object A:
  val foo: Int = "asdasd" // error

// a/src/main/scala/AFix,scala
object AFix:
  val fixedFoo: Int = 42
// Metals triggers BSP compilation for `a` but it fails
// PC won't be restarted

// --- ask for completion ---
// a/src/main/scala/B,scala
object B:
  val bar = AFix.@@  // <- no completions there is no `AFix.class` in  `.bloop/a/classes-Metals-****`

I am not sure what signatures are, and what they are needed for.

By signatures I mean some kind of format that might be used instead of .class and won't require full compilation. In scala2 there is -Ypickle-write setting that produces .sig files which are used instead of .class in case of pipeline compilation.

In Scala3 it we can use .tasty in a similar way but it require some cleaning from errored trees.

About PC and loading all sources. I was thinking about this approach where we force PC to compile all project files at first but it won't solve an issue with multiply projects. If we have an error in upstream module, we can't compile it, then downstream PC anyway won't be able to see its symbols. Also, it would require Metals to deal with incremental compilation (source might be modified not only by editor - git checkout, delete from fs, ...) and it feels like it would be better to keep on BSP side.

What about semanticdb, yep it's needed to produce it in case of errors too but having only it won't change much in experience.

tgodzik commented 1 year ago

About PC and loading all sources. I was thinking about this approach where we force PC to compile all project files at first but it won't solve an issue with multiply projects. If we have an error in upstream module, we can't compile it, then downstream PC anyway won't be able to see its symbols. Also, it would require Metals to deal with incremental compilation (source might be modified not only by editor - git checkout, delete from fs, ...) and it feels like it would be better to keep on BSP side.

So if understand right, compiler will automatically use sources if the classfiles are older and if we provide the compiler with both classpath and source files? If so this should work in some case, but as @dos65 mentions we don't really know how to deal with incremental compilation.

Let's say we have files A.scala, B.scala and C.scala. They depend on each other: A uses symbols from B and B uses symbols from C.

object A:
  def a = B.b

object B:
  def b = C.c

object C:
  def c = java.nio.Paths.get(".")

If we we change C to return a different type in one of it's methods, but break another thing:

object C:
  def c = ""
  val other : Int = ""

Compiler will correctly try and compile C, but it wouldn't recompile B, right? And B.b has a different type now, so the completions in A should change.

This is an issue we were trying to solve with outline compilation or with producing tasty files on error. Incremental compilation would be solved by Zinc within the build tool. No need to use that logic in the compiler or Metals.

So essentially we thought we can have two ways of working around that problem:

produce compilation artifacts needed by the presentation compiler even if we have errors
if there is an error in one of the module, use outline compilation on all the sources from the module and dependent to produce the needed symbols (public ones)

Or is the asumption wrong and the compiler will deal with incremental compilation correctly?

dos65 commented 1 year ago

@tgodzik thx for the addition. I forgot about invalidation problem.

odersky commented 1 year ago

Compiler will correctly try and compile C, but it wouldn't recompile B, right? And B.b has a different type now, so the completions in A should change.

It's actually debatable what the right outcome here would be. B has a perfectly good tasty signature. Should it change already when C changes even though C has errors? One might also argue that holding on to B's last signature is preferable. And it's anyway a marginal use case, since everyone recommends that public methods such as B.b should have an explicit result type.

The other (and I believe more important) case is where B does not have a tasty file at all, or the tasty file is outdated wrt source. In that case, with the right classpath setup, B will be loaded from source, and reflect C's changes.

I am not sure what should happen with multiple projects. If project A has errors and project B relies on it, then project A will probably have a last good build state, right? In that case it also seems to be better to rely on that good state rather than trying to outline compile project A on the fly. Outline compiling arbitrary dependent projects might consume huge amounts of resources.

I am actually quite wary of outline compilation. I fear it just introduces another thing that might break, maybe by hanging the IDE. Also the idea of somehow healing the compilation so that Tasty can be generated feels extremely ad hoc. I don't think this can work reliably.

In any case, before doing anything, we should make sure that semanticdb gets generated also for erroneous sources. That can be done with much less risk, and it's arguably the right thing to do.

odersky commented 1 year ago

One thing to correct: People on the tooling side have somehow this expression that main source of power is zinc, and that the compiler is just a subtask of that. This is completely wrong. zinc is just a very thin veneer over the compiler. The compiler itself figures out what has changed and what depends on what. It also is perfectly capable of incrementally compiling a changed set of sources. All that zinc does is take the dependency info from the compiler, figure out what set of files should be compiled next, and pass them to the main compiler's command line. So it's the main compiler that's "incremental", not zinc.

I have talked about this in detail in my "Compilers are Databases" presentation.

tgodzik commented 1 year ago

The compiler itself figures out what has changed and what depends on what. It also is perfectly capable of incrementally compiling a changed set of sources.

Does it work on the levele of methods? My understanding was that Zinc is able to detect for example if a method that changed is actually used in downstream source files, if it's not then nothing is recompiled.

dos65 commented 1 year ago

@odersky

I am actually quite wary of outline compilation. I fear it just introduces another thing that might break, maybe by hanging the IDE. Also the idea of somehow healing the compilation so that Tasty can be generated feels extremely ad hoc. I don't think this can work reliably.

In any case, before doing anything, we should make sure that semanticdb gets generated also for erroneous sources. That can be done with much less risk, and it's arguably the right thing to do.

For me it feels like that the core of implementation for errored tree -> semanticdb won't be much different from this kind of outline compilation. It should include some kind of healing and skipping broken trees and that the main thing that might break.

Also, from Metals side, having only semanticdb in case of errors won't bring much. Most scala projects have more than one module and we will hit the same case: upstream module is broken but has semanticdb, downstream modules doesn't have any compiler output because we can't compile it having only semanticdb from upstream.

odersky commented 1 year ago

Does it work on the levele of methods? My understanding was that Zinc is able to detect for example if a method that changed is actually used in downstream source files, if it's not then nothing is recompiled.

Yes, it gets this info from phase ExtractDependencies in the dotty compiler.

Speaking of the outline compiler, how is that supposed to work? Is it a separate compiler or some changes to current dotty? When is the outline compiler supposed to run? During builds, in the background, or on demand?

In my mind it would be easiest if we just taught the normal build compiler to generate semanticdb and tasty files even under errors. Is that the idea?

tgodzik commented 1 year ago

In my mind it would be easiest if we just taught the normal build compiler to generate semanticdb and tasty files even under errors. Is that the idea?

That's what we were thinking to do here. Outline compiler could be an option to the the compiler similar to Scala 2, but if anything it would only be an optimization when the approach we take turns out too heavy..

odersky commented 1 year ago

So is the outline compiler going to take the form of a PR against dotty?

tgodzik commented 1 year ago

So is the outline compiler going to take the form of a PR against dotty?

If we decide to work on that yeah, but might not be needed at all. If we just compile and output the tasty files plus semanticdb even in case of errors I think that should be enough.

odersky commented 1 year ago

Yes, but that needs changes in the compiler, right? Small changes, to be sure, but still...

tgodzik commented 1 year ago

Yes, for sure we need to somehow change the compiler to output let's call it best-effort compilation.

odersky commented 1 year ago

And the Tasty and semanticDb formats to allow an ErrorType tag. That should be it. Should not take more than an afternoon.

The only question is how to treat erroneous tasty. Would that be a normal Tasty file? Should the header of the Tasty file say that it is erroneous? Or should the file have a separate extension? (but then we need added work to clean up error tasty files once the project compiles).

dos65 commented 1 year ago

I thought that it would be better to put such erroneous tasty into separate directory from usual classes. Maybe with a separate extension.

Just to keep the existing behavior - compiler output is produced only if project compiles. There might be a rare case if this magic setting will be enabled by Metals in SBT-bsp or some other build-tool and someone might decide to to take target/scala3*/classes and run/package/publish them.

odersky commented 1 year ago

I thought that it would be better to put such erroneous tasty into separate directory from usual classes. Maybe with a separate extension.

Yes, maybe. I don't really know how the classpath handling would work for dependent projects, but it seems doable.

armanbilge commented 1 year ago

Sorry, this is off-topic, but the comment about zinc grabbed my attention. @odersky thank you for highlighting it, I indeed had the wrong impression about how it was working.

The compiler itself figures out what has changed and what depends on what. It also is perfectly capable of incrementally compiling a changed set of sources.

I was able to find the ExtractDependencies phase in the compiler, and its implementation of the xsbti interfaces. I understand that the compiler itself figures out what depends on what, and reports it via that interface.

However, I am less clear about the part where the compiler itself figures out what changed, and does incremental compilation all on its own. If I understand correctly, does that mean that figuring out the right files to pass to the compiler's command line (as zinc currently does) is not actually necessary, since it can figure this out on its own?

If this is the case, how is this incremental mode enabled? I tried running scalac -verbose on some sources and it appeared to be recompiling everything (or at the very least, rewriting the class files each time).

Thanks!

odersky commented 1 year ago

However, I am less clear about the part where the compiler itself figures out what changed, and does incremental compilation all on its own. If I understand correctly, does that mean that figuring out the right files to pass to the compiler's command line (as zinc currently does) is not actually necessary, since it can figure this out on its own?

No, that's the missing step. The compiler contains no algorithm that determines which set of files to give next. zinc does that, based on the info it gets from the compiler. What the compiler can do, however, is compile a set of files on top of what it already has without recompiling the info of the other files, which is still kept in memory. In that sense, it is incremental.

jchyb commented 1 year ago

@odersky I wanted to take this feature on, so I want to initially confirm how we should proceed. To me the two above approaches of "outline" and "best-effort" seem basically the same with a minor difference.

In "outline", after erroring we force through until the Pickler phase, where we also remove erroneous subtrees, outputting "correct" tasty not requiring any changes to its format (like proposed by @dos65 with his poc).
In "best-effort-compilation", after erroring we force through until the Pickler phase and output unfiltered tasty, requiring changes to the tasty format like adding ErrorTag.

I would like to have both behind flags outputting to a separate directory (rather than outputting to a directory with regular, successful output), for reasons @dos65 mentioned.

Which approach would be better to implement? Both accomplish our goals, but I will admit the "outline" option does seem easier to me - no changes to the tasty format + it's difficult for me to predict whether things would break if we would use those errored tasty directly on the classpath for the compiler (though we can always heal those separately).

As a side-note, In my tests (of the "outline" method, though this should apply to both), the most problematic part (though it shouldn't be difficult to fix) was forcing errored trees through the compilation until Pickler - there were some typer errors which would crash the compiler in PostTyper due to things like missing match cases - I am unsure yet if it would be easier/better to go through those one by one adding correct cases, or skip those phases altogether, but for now I was planning to try out both.

tgodzik commented 1 year ago

One comment about outline compilation (which I think is a source of confusion here) - outline compilation by default wouldn't try to compile tree bodies if not needed and this might actually be complex to achieve, which we don't want to touch yet..

jchyb commented 1 year ago

Right, this is not a behoviour I took into account when trying to describe what I called "outline" in my post above (nor do I think it would be of any benefit here)

smarter commented 1 year ago

If we're willing to consider completely different approaches, it's maybe worth having a look at https://github.com/github/stack-graphs/ which is what github is starting to use to provide semantic go-to-definition/find-all-references. It's not well documented but my understanding is that it records for each source file a graph reifying the scope lookup logic: this is done per-file to be incremental, and the file graphs are fused together to answer queries. Since the graphs are generated from syntactic ASTs produced by tree-sitter it can return too many results (e.g., with overloads). This might be good enough for projects that don't compile anyway, and the results could be further filtered using the presentation compiler.

dos65 commented 1 year ago

@smarter Stack-graph is an interesting thing but I don't think that it might solve the described issue. It might solve some part but there is no way to provide completions and hovers which are the more important from IDE viewpoint.

However, I think it would be nice I there would be stack-graph impl for Scala from the perspective that it will be used by Github.

odersky commented 1 year ago

@jchyb I would suggest to go the best-effort route with explicit error tags. Advantages:

Least effort: the compiler already knows how to handle error types. So all we need is to serialize them.
Highest precision: Trying to heal a piece of code that contains errors is itself error-prone. The healing might introduce new errors or remove large pieces of definitions.
Keeping things honest: Tasty containing errors is semantically quote different from regular Tasty. For instance we can generate code only from regular Tasty. So it's actually good if this difference is made explicit in more than just where a Tasty file is stored.

sjrd commented 1 year ago

Least effort: the compiler already knows how to handle error types. So all we need is to serialize them.

And specify how to handle them. The compiler is not the only consumer of tasty files. There's at least the tasty reader in Scala 2 and tasty-query.

odersky commented 1 year ago

I don't think Tasty reader or Tasty query would ever need to see Tasty with errors in them. So, technically, this is a Tasty+ format for exclusive consumption in IDEs.

sjrd commented 1 year ago

IDEs may use tasty-query for consumption of tasty files, for some features. Currently it's only used in the Metals debugger, which indeed only makes sense for code that compiles, but it could later be used for more stuff.

scalameta / metals-feature-requests