Open capnspacehook opened 1 year ago
Thanks for the report!
This is one of the current limitations of the analysis. As you said, we use the golang.org/x/tools module's callgraph generators, which find possible calls between pairs of functions. Stitching these calls together can produce stacks of calls that don't happen in practice -- if function A can call function B in one part of a program, and B can call C in another part of the program, that doesn't mean that the path A->B->C can occur, as you've found.
We have some workarounds for this in limited cases, and we also have plans for more general improvements to the callgraph analysis to tackle this problem in the future!
Thanks for the detailed explanation! I'm really curious what your plans for improving this are. I started researching different call graph analysis algorithms and discovered CHA is guaranteed to produce a sound but not very precise graph. Running the graph through VTA to prune it helps but there's still a lot of superfluous edges as you said.
Since I was analyzing a program with an entrypoint instead of a library I tried using RTA + VTA to create a more precise callgraph. I've read conflicting information as to if RTA produces a sound callgraph, but I found that it doesn't. There are some false negatives compared to using CHA, but less false positives.
Because this is a security tool I understand why you aim to avoid false negatives as much as possible. I do think RTA could be used alongside CHA when main packages are being analyzed to help users find false positives. Any capabilities found from the RTA callgraph or in both callgraphs would be considered reliable, and any capabilities solely found by CHA would be marked as a possible false positive in the output.
Callgraph analysis is very new to me so I'm sure whatever ideas you have in mind to improve it are better than what I proposed, but I figured it wouldn't hurt to lay out my thought process.
When running capslock against one of my projects, I noticed some of the
CAPABILITY_NETWORK
classifications didn't seem to make sense. Digging into it further revealed that they were incorrect.Running capslock at 29c2da02ab5d3ab22f0745478e6c6d72fd80ab8e against https://github.com/capnspacehook/egress-eddie/tree/faa23e15384d4a7f148e3bcb9fa30f3ab4d37d4c with
capslock -packages github.com/capnspacehook/egress-eddie -output j
displayed a few classifications like this:capslock seems to think
toml.Decode
is calling(net.pipeAddr).String
eventually, but digging into the source reveals this is unlikely.(*github.com/BurntSushi/toml.MetaData).unifyText
uses a type switch to create a string from an argument of typeany
. In thefmt.Stringer
case capslock thinks that the now knownfmt.Stringer
type is the typenet.pipeAddr
. Source of the final call in the stack: https://github.com/BurntSushi/toml/blob/v1.2.1/decode.go#L513.I understand that
fmt.Stringer
is an interface and apparentlynet.pipeAddr
satisfies it, but it seems like capslock is assuming the concrete type of thefmt.Stringer
here.EDIT: after looking into this a bit more it seems this is just what
golang.org/x/tools/go/ssa
andgolang.org/x/tools/go/callgraph
reports and I'm not sure how difficult detecting this situation would be.I tried to create a minimal reproducer the just called
toml.Decode
and somenet
functions so they would be loaded, but couldn't reproduce the same behavior unfortunately.Thanks for building and open sourcing this tool, I've wanted something like this for a long time!