google / capslock

BSD 3-Clause "New" or "Revised" License
804 stars 27 forks source link

incorrect CAPABILITY_NETWORK classification #37

Open capnspacehook opened 1 year ago

capnspacehook commented 1 year ago

When running capslock against one of my projects, I noticed some of the CAPABILITY_NETWORK classifications didn't seem to make sense. Digging into it further revealed that they were incorrect.

Running capslock at 29c2da02ab5d3ab22f0745478e6c6d72fd80ab8e against https://github.com/capnspacehook/egress-eddie/tree/faa23e15384d4a7f148e3bcb9fa30f3ab4d37d4c with capslock -packages github.com/capnspacehook/egress-eddie -output j displayed a few classifications like this:

{
  "packageName": "egresseddie",
  "capability": "CAPABILITY_NETWORK",
  "depPath": "github.com/capnspacehook/egress-eddie.parseConfigBytes github.com/BurntSushi/toml.Decode (*github.com/BurntSushi/toml.Decoder).Decode (*github.com/BurntSushi/toml.MetaData).unify (*github.com/BurntSushi/toml.MetaData).unifyText (net.pipeAddr).String",
  "path": [
    {
      "name": "github.com/capnspacehook/egress-eddie.parseConfigBytes"
    },
    {
      "name": "github.com/BurntSushi/toml.Decode",
      "site": {
        "filename": "config.go",
        "line": "90",
        "column": "24"
      }
    },
    {
      "name": "(*github.com/BurntSushi/toml.Decoder).Decode",
      "site": {
        "filename": "decode.go",
        "line": "36",
        "column": "51"
      }
    },
    {
      "name": "(*github.com/BurntSushi/toml.MetaData).unify",
      "site": {
        "filename": "decode.go",
        "line": "169",
        "column": "21"
      }
    },
    {
      "name": "(*github.com/BurntSushi/toml.MetaData).unifyText",
      "site": {
        "filename": "decode.go",
        "line": "213",
        "column": "22"
      }
    },
    {
      "name": "(net.pipeAddr).String",
      "site": {
        "filename": "decode.go",
        "line": "513",
        "column": "19"
      }
    }
  ],
  "packageDir": "github.com/capnspacehook/egress-eddie",
  "capabilityType": "CAPABILITY_TYPE_TRANSITIVE"
}

capslock seems to think toml.Decode is calling (net.pipeAddr).String eventually, but digging into the source reveals this is unlikely. (*github.com/BurntSushi/toml.MetaData).unifyText uses a type switch to create a string from an argument of type any. In the fmt.Stringer case capslock thinks that the now known fmt.Stringer type is the type net.pipeAddr. Source of the final call in the stack: https://github.com/BurntSushi/toml/blob/v1.2.1/decode.go#L513.

I understand that fmt.Stringer is an interface and apparently net.pipeAddr satisfies it, but it seems like capslock is assuming the concrete type of the fmt.Stringer here.

EDIT: after looking into this a bit more it seems this is just what golang.org/x/tools/go/ssa and golang.org/x/tools/go/callgraph reports and I'm not sure how difficult detecting this situation would be.

I tried to create a minimal reproducer the just called toml.Decode and some net functions so they would be loaded, but couldn't reproduce the same behavior unfortunately.

Thanks for building and open sourcing this tool, I've wanted something like this for a long time!

jcd2 commented 1 year ago

Thanks for the report!

This is one of the current limitations of the analysis. As you said, we use the golang.org/x/tools module's callgraph generators, which find possible calls between pairs of functions. Stitching these calls together can produce stacks of calls that don't happen in practice -- if function A can call function B in one part of a program, and B can call C in another part of the program, that doesn't mean that the path A->B->C can occur, as you've found.

We have some workarounds for this in limited cases, and we also have plans for more general improvements to the callgraph analysis to tackle this problem in the future!

capnspacehook commented 1 year ago

Thanks for the detailed explanation! I'm really curious what your plans for improving this are. I started researching different call graph analysis algorithms and discovered CHA is guaranteed to produce a sound but not very precise graph. Running the graph through VTA to prune it helps but there's still a lot of superfluous edges as you said.

Since I was analyzing a program with an entrypoint instead of a library I tried using RTA + VTA to create a more precise callgraph. I've read conflicting information as to if RTA produces a sound callgraph, but I found that it doesn't. There are some false negatives compared to using CHA, but less false positives.

Because this is a security tool I understand why you aim to avoid false negatives as much as possible. I do think RTA could be used alongside CHA when main packages are being analyzed to help users find false positives. Any capabilities found from the RTA callgraph or in both callgraphs would be considered reliable, and any capabilities solely found by CHA would be marked as a possible false positive in the output.

Callgraph analysis is very new to me so I'm sure whatever ideas you have in mind to improve it are better than what I proposed, but I figured it wouldn't hurt to lay out my thought process.