phylum-dev / cli

Command line interface for the Phylum API
https://phylum.io
GNU General Public License v3.0
99 stars 10 forks source link

Improve `Go` dependency parsing #1396

Closed ejortega closed 2 months ago

ejortega commented 3 months ago

Overview

The current lockfile generator creates a go.sum file and parses the output for dependencies. However the go.sum contents is not really a lockfile and analysis results are skewed due to these additional versions even if they aren't used to build the application.

Update the go.sum parser to ignore hashes for /go.mod. These only indicate that a package was considered and not that the package is actually used. There will still be too many dependencies but should have improved results.

Newer go projects supporting go directive 1.17 and higher can use the go.mod parser.

Acceptance Criteria

cd-work commented 3 months ago

How do you envision this to be done? Our lockfile analysis cannot run arbitrary commands. While we could use this for lockfile generation, I don't see us being able to use this for lockfile analysis.

Generally speaking it doesn't sound like this output is ever intended to land in a file, which is a prerequisite for lockfile parsing.

kylewillmon commented 3 months ago

While we could use this for lockfile generation, I don't see us being able to use this for lockfile analysis.

That's exactly the plan. The Go lockfile generator currently uses go get -d. The plan is to update that generator to use go list instead.

Generally speaking it doesn't sound like this output is ever intended to land in a file

That is true, but the output contains enough information to generate a virtual go.sum file, just like our pip generator does with pip install --dry-run to requirements.txt

cd-work commented 3 months ago

Yeah I apologize I must have been completely blind when reading this yesterday. It states clearly multiple times that this refers to lockfile generation, so not sure why I thought this had anything to do with our normal lockfile analysis.

kylewillmon commented 2 months ago

I've been doing some research on this... Go is complicated and the behavior has changed a lot... Even go list has issues because it does not respect module pruning.

Proposal

To improve our handling, I think we can do a few things:

  1. Update the go.sum parser to ignore hashes for /go.mod. These only indicate that a package was considered and not that the package is actually used.
  2. Replace the Go lockfile generator with a go.mod parser which only works for files with a go directive of 1.17 or higher.

The go.sum parser may still include too many dependencies, but it should be much better. And for projects that can use the go.mod parser instead, they should get a dependency list that accurately reflects go's lazy module loading and module pruning behavior