uber-go / nilaway

Static analysis tool to detect potential nil panics in Go code
Apache License 2.0
3.06k stars 61 forks source link

Internal panic (conflicting position information on upstream object) with latest rules_go #149

Open kpaulisse opened 10 months ago

kpaulisse commented 10 months ago

After updating to the latest rules_go and version of nilaway I am now getting the following panic:

<path redacted>/create.go:1:1: error: INTERNAL PANIC: conflicting position information on upstream object go.mongodb.org/mongo-driver/mongo.Cursor.M7: existing: external/org_mongodb_go_mongo_driver/mongo/cursor.go:292:18, got: external/org_mongodb_go_mongo_driver/mongo/cursor.go:292:1
goroutine 76 [running]:
runtime/debug.Stack()
    GOROOT/src/runtime/debug/stack.go:24 +0x64
go.uber.org/nilaway/accumulation.run.func1()
    external/org_uber_go_nilaway/accumulation/analyzer.go:85 +0x44
panic({0x102a6ec40?, 0x140023e73e0?})
    GOROOT/src/runtime/panic.go:914 +0x218
go.uber.org/nilaway/inference.newPrimitivizer.func1({{{0x14000f34c00, 0x34}, 0x123, 0x124, 0x1}, {0x1400116a150, 0x21}, {0x14000efd338, 0x16}, 0x0, ...}, ...)
    external/org_uber_go_nilaway/inference/primitive.go:157 +0x218
go.uber.org/nilaway/inference.(*InferredMap).OrderedRange(...)
    external/org_uber_go_nilaway/inference/inferred_map.go:97
go.uber.org/nilaway/inference.newPrimitivizer(0x14001b0ec30)
    external/org_uber_go_nilaway/inference/primitive.go:150 +0x274
go.uber.org/nilaway/inference.NewEngine(0x14001b0ec30, {0x102ae65c0?, 0x140022ec810})
    external/org_uber_go_nilaway/inference/engine.go:61 +0x2c
go.uber.org/nilaway/accumulation.run(0x14001b0ec30)
    external/org_uber_go_nilaway/accumulation/analyzer.go:118 +0x2e4
main.(*action).execOnce(0x140001e74a0)
    external/io_bazel_rules_go/go/tools/builders/nogo_main.go:353 +0x784
sync.(*Once).doSlow(0x6d2f782f72657669?, 0x6972642f6f676e6f?)
    GOROOT/src/sync/once.go:74 +0x100
sync.(*Once).Do(...)
    GOROOT/src/sync/once.go:65
main.(*action).exec(...)
    external/io_bazel_rules_go/go/tools/builders/nogo_main.go:280
main.execAll.func1(0x2e6e6f6974636173?)
    external/io_bazel_rules_go/go/tools/builders/nogo_main.go:274 +0x74
created by main.execAll in goroutine 74
    external/io_bazel_rules_go/go/tools/builders/nogo_main.go:272 +0x4c
 (nilaway)

✅ nilaway v0.0.0-20231204220708-2f6a74d7c0e2 works correctly with:

❌ nilaway v0.0.0-20231204220708-2f6a74d7c0e2 panics with:

✅ nilaway v0.0.0-20231204220708-2f6a74d7c0e2 works correctly with:

✅ nilaway v0.0.0-20231204220708-2f6a74d7c0e2 works correctly with latest master of rules_go:

Finally here's the permalink to the line in the mongo-driver package that is apparently causing the panic: https://github.com/mongodb/mongo-go-driver/blob/1f344bd232e6dd3eb70f9d1df3e82cb532191200/mongo/cursor.go#L292

yuxincs commented 10 months ago

I can confirm this problem is reproduced and it seems to be related to the x/tools upgrade (this problem exists since v0.11.1 to the latest version).

Currently you can pin the version x/tools to v0.11.0 in go.mod file: https://go.dev/ref/mod#go-mod-file-replace to avoid this panic. We will investigate further and raise a PR for a fix (contributions welcome!)

philipuvarov commented 6 months ago

Hey there! I have encountered the same issue today. Is there a possible workaround for it? x/tools version 0.18.0 here

philipuvarov commented 6 months ago

Checked it on 0.19.0 and rules_go 0.46.0 with the same results. I have also went through the source of the rules_go a bit and there does not seem to be an easy way to patch it seems. We have tried running as a standalone checker, but the overhead becomes prohibitively large(somehow I was hitting 50GB+ RAM usage from nilaway on my machine hehe).

@yuxincs @sonalmahajan15 do you currently have any plans on looking into this one?

PS / off-topic. It would be nice to have some guidelines on how(if possible) to run the plugin locally, I think it would simplify investigations and make contributions easier. I am thinking if something similar to --override_repository= is possible.

yuxincs commented 6 months ago

Hi @philipuvarov, we have investigated the issue and the root cause was that the x/tools library applied an optimization to not export facts of the transitive dependencies. Instead, it now only exports the facts about a package's direct dependencies. This is nice optimization for performance and storage. However, it broke NilAway's assumption since we carefully implemented an export logic that only stores the incremental parts (i.e., the new knowledge we gained after analysis of a particular package).

Say we have package dependencies like foo -> bar -> baz, now when NilAway analyzes package baz, it won't be able to see facts about foo anymore, and the facts we store for bar is an incremental knowledge over package foo.

Naively storing all facts is too expensive in our experiment, and we're investigating smarter ways to tackle this 😃

In the meantime, you can apply this patch that basically disables the "panic" you're seeing: it is not a real panic, it is an artificial safe guard we have placed in NilAway when facts from upstream are incomplete (i.e., cases like this).

diff --git a/inference/primitive.go b/inference/primitive.go
index 495dec6..ca360e0 100644
--- a/inference/primitive.go
+++ b/inference/primitive.go
@@ -154,10 +154,11 @@ func newPrimitivizer(pass *analysis.Pass) *primitivizer {

            objRepr := site.PkgPath + "." + string(site.ObjectPath)
            if existing, ok := upstreamObjPositions[objRepr]; ok && existing != site.Position {
-               panic(fmt.Sprintf(
-                   "conflicting position information on upstream object %q: existing: %v, got: %v",
-                   objRepr, existing, site.Position,
-               ))
+               // panic(fmt.Sprintf(
+               //  "conflicting position information on upstream object %q: existing: %v, got: %v",
+               //  objRepr, existing, site.Position,
+               // ))
+               return true
            }
            upstreamObjPositions[objRepr] = site.Position
            return true

This would disable the panic, but meanwhile due to the incomplete knowledge, false negatives might happen. It's definitely less than ideal (which is why we didn't put this patch into the main branch), and we will try to find a better approach soon.

somehow I was hitting 50GB+ RAM usage from nilaway

If you restrict the analysis only to 1st party code, would it still behave that way? (see instructions)

PS / off-topic. It would be nice to have some guidelines on how(if possible) to run the plugin locally, I think it would simplify investigations and make contributions easier. I am thinking if something similar to --override_repository= is possible.

Yeah, this is a bit tricky. I tried this some time ago but to no avail. --override_repository didn't seem to work since it expects the repository to be managed by bazel (e.g., rules_go), but for plain Go repositories we need gazelle to also play along, which didn't seem to be the case (happy to be proved wrong 😃 ). I was also trying adding a replace directive in go.mod and let gazelle pick it up, but didn't succeed for reasons I can't remember.

PRs more than welcome if you've got better ideas here, as always.

philipuvarov commented 5 months ago

@yuxincs thanks a lot and sorry for the late response!

If you restrict the analysis only to 1st party code, would it still behave that way? (see instructions)

So funny thing about this, actually it was generated code(protobuf and gRPC) that was causing RAM consumption to go absolutely bonkers, even when running with Nogo. I did remove proto packages with the exclude_pkgs, which made the RAM consumption to be more reasonable, however, I noticed that the build was still taking a lot of time(i.e. target that used to take seconds was taking ~3-5min).

So after that I have removed generated code packages from the Nogo instrumentation, something like this:

go_register_nogo(
    excludes = [
+        "@//my/package/proto:__subpackages__",
    ],
    includes = [
+        "@//my/package:__subpackages__",
    ],
)

This has brought down build times back to normal. But also, after excluding generated code from the Nogo, the issue with the upstream panic has gone away.

Not sure what or why, but also @yuxincs if you think it might be valuable documenting this Nogo thing(if not for panic, but for performance reasons), I can make a small PR to mention it in the docs(however, it is more Nogo, then NilAway).