sourcegraph / scip-java

SCIP Code Intelligence Protocol generator for Java
https://sourcegraph.github.io/scip-java/
Apache License 2.0
68 stars 28 forks source link

scip-java for Bazel does not handle .srcjar inputs #736

Open JohnnyMorganz opened 1 month ago

JohnnyMorganz commented 1 month ago

Hey, thanks for this tool. We just started trying it out to try and index Java code in our Bazel monorepo.

We hit an issue where running the Bazel aspect would hit compilation failures for some code. e.g.

info: /tmp/scip-java3739267713397442642/s/com/x/y/z/File.java:33: error: cannot find symbol
info:     @MissingClass private final String some_variable;
info:      ^
info:   symbol:   class MissingClass
info:   location: class AnotherClass

Digging deeper, it turns out that MissingClass is a java file that is stored in a .srcjar, that is passed as an input to a java target. We use the .srcjar mechanism to allow us to generate some Java files beforehand and easily pass them to a java target (for those unfamiliar with srcjar in Bazel, it is simply just a normal jar of .java files, that Bazel will first unpack before passing to the javac compiler)

It looks like scip-java calls javac directly, but does not unpack the .srcjar contents beforehand, and so it fails.

Here is a small repro. Note interestingly the scip-java run does work for this small repro, but we get an error message "Some SemanticDB files got generated even if there were compile errors. In most cases, this means that scip-java managed to index everything except the locations that had compile errors and you can ignore the compile errors.". However in the larger private example, it seems the compilation failure is blocking. If we run the command that the aspect is running, we see the compilation failure

# testing/BUILD.bazel
genrule(
    name = "generated-srcjar",
    outs = ["sources.srcjar"],
    cmd = "echo 'package com.testing; public class Bar {};' > Bar.java && jar cf $(@) Bar.java",
)

java_library(
    name = "testing",
    srcs = [
        "Foo.java",
        ":generated-srcjar",
    ],
)
// testing/Foo.java
package com.testing;

public class Foo {
    public Bar foo(Bar value) {
        return value;
    }
}
$ bazel build //testing # successful
$ "/home/code/scip-java" index --no-cleanup --index-semanticdb.allow-empty-index --cwd "/home/code" --targetroot bazel-out/k8-dbg--cd/bin/testing/testing.semanticdb --scip-config "bazel-out/k8-dbg--cd/bin/testing/testing.scip.json" --output "bazel-out/k8-dbg--cd/bin/testing/testing.scip"
info: $ /opt/.../jdk/bin/javac @/home/code/bazel-out/k8-dbg--cd/bin/testing/testing.semanticdb/javacopts.txt
info: /home/code/testing/Foo.java:4: error: cannot find symbol
info:     Bar value;
info:     ^
info:   symbol:   class Bar
info:   location: class Foo
info: 1 error
info: Some SemanticDB files got generated even if there were compile errors. In most cases, this means that scip-java managed to index everything except the locations that had compile errors and you can ignore the compile errors.
info: Result of /opt/.../jdk/bin/javac…: 1

info: /home/code/bazel-out/k8-dbg--cd/bin/testing/testing.scip

Also, inside of the testing.scip.json, we see the srcjar listed in sourceFiles:

{
  ...
  "sourceFiles": [
    "testing/Foo.java",
    "bazel-out/k8-dbg--cd/bin/testing/sources.srcjar"
  ]
}

but then in the generated javacopts.txt, it disappears:

"-encoding"
"utf8"
"-nowarn"
"-d"
"/tmp/scip-java8260898732520975292/d"
"-s"
"/tmp/scip-java8260898732520975292/s"
"-h"
"/tmp/scip-java8260898732520975292/h"
"-classpath"
"/tmp/scip-java8260898732520975292/semanticdb-plugin.jar"
"-Xplugin:semanticdb -targetroot:/home/code/bazel-out/k8-dbg--cd/bin/testing/testing.semanticdb -sourceroot:/home/code"
"-source"
"11"
"-target"
"11"
"-g"
"-parameters"
"/home/code/testing/Foo.java"
JohnnyMorganz commented 1 month ago

As a side note, I notice that scip-java calls javac directly from the provided JAVA_HOME. Bazel has a wrapper around Javac that you'll see in the "Javac" action. 2 points from this:

1) It seems sub-optimal to rely on the JAVA_HOME variable set on the system, as it can potentially be different than what is configured for Bazel. Maybe it could access the java_home from the toolchain instead, and then this env variable doesn't need to be used?

It could be done by adding an attrs to the aspect, then accessing it like so:

attrs = {
        "_jdk": attr.label(default = Label("@bazel_tools//tools/jdk:current_host_java_runtime")),
    },

...
"javaHome": ctx.attr._jdk[java_common.JavaRuntimeInfo].java_home,

# also need to include 'ctx.attr._jdk[java_common.JavaRuntimeInfo].files' as an input to the ScipJavaIndex action

2) I tried testing by modifying the command line that bazel uses to compile the Javac action (i.e., add -Xplugin:semanticdb ... to javacopts and semanticdb-plugin.jar to the classpath in the command line). This did seem to work, handling the .srcjar appropriately, and generating the semanticdb data. Might be an avenue to explore, but unsure how complex it is to integrate.

varungandhi-src commented 1 month ago

@JohnnyMorganz are you using scip-java as a Sourcegraph customer? If so, would you mind reaching out through your account's assigned Support Engineer for help?

JohnnyMorganz commented 1 month ago

Yes we are, I can forward it to them thanks