rust-secure-code / cargo-auditable

Make production Rust binaries auditable
Apache License 2.0
656 stars 30 forks source link

Make cyclic dependencies impossible to represent in the data format #63

Closed Shnatsel closed 2 years ago

Shnatsel commented 2 years ago

Right now it is possible to represent cyclic dependencies in the data format. For example:

{"packages":[
{"name":"foo","version":"0.3.1","source":"local","dependencies":[1]},
{"name":"bar","version":"0.2.1","source":"crates.io","dependencies":[0]}
]}

But Cargo does not actually allow cyclic dependencies.

This quirk of the format requires any consumer of this data to check for cyclic dependencies first, or risk an infinite loop during traversal.

It would be nice to make cyclic dependencies impossible to represent, for example:

{"packages":[{"name":"foo","version":"0.3.1","source":"local"},{"name":"bar","version":"0.2.1","source":"crates.io"}]}
{"dependencies":{"0": [{"1":{[]}]}}

Or something along those lines. The idea is to encode the relationships in a JSON tree that is guaranteed to be acyclic.

bjorn3 commented 2 years ago

Cargo does allow cyclic dependencies between packages through for example [dev-dependencies]. What is not possible is cyclic dependencies between crates.

Shnatsel commented 2 years ago

We do not record [dev-dependencies], so AFAIK we should be fine.

Shnatsel commented 2 years ago

Okay, so I have a prototype in the acyclic-dependencies branch. It encodes the relationships in the structure of the JSON itself, so it's impossible to express cycles.

Before: {"packages":[{"name":"adler","version":"1.0.2","source":"crates.io"},{"name":"auditable-extract","version":"0.3.1","source":"crates.io","dependencies":[2]},{"name":"binfarce","version":"0.2.1","source":"crates.io"},{"name":"miniz_oxide","version":"0.5.3","source":"crates.io","dependencies":[0],"features":["default"]},{"name":"rust-audit-info","version":"0.4.0","source":"local","dependencies":[1,3]}]}

After: {"packages":[{"name":"adler","version":"1.0.2","source":"crates.io"},{"name":"auditable-extract","version":"0.3.1","source":"local"},{"name":"binfarce","version":"0.2.1","source":"crates.io"},{"name":"miniz_oxide","version":"0.5.3","source":"crates.io"},{"name":"rust-audit-info","version":"0.4.0","source":"local"}],"dependency_tree":{"4":[{"1":[{"2":[]}]},{"3":[{"0":[]}]}]}}

Note the added "dependency_tree" bit at the end, and the absence of dependencies and root fields in the packages array.

@knqyf263 any thoughts about this?

Shnatsel commented 2 years ago

Sadly this change increased the audit data size for cargo-geiger from 1.2Kb to 3.2Kb, although I'm not exactly sure why. Perhaps the lack of subtree deduplication is to blame.

Also, this change may not be a win for Trivy. Quoting Tom Fay:

It looks like the existing approach better maps to the data structure trivy uses, as trivy ultimately puts package information into this struct, https://github.com/aquasecurity/trivy/blob/e848e6d009af45cda073133c7c8d3220a082335c/pkg/fanal/types/artifact.go#L43, which stores dependencies as an array field

knqyf263 commented 2 years ago

I think the consumer has the responsibility of checking for cyclic dependencies.

For example, CycloneDX represents the dependency graph as below.

  "dependencies": [
    {
      "ref": "acme-app",
      "dependsOn": [
        "pkg:maven/org.acme/web-framework@1.0.0",
        "pkg:maven/org.acme/persistence@3.1.0"
      ]
    },
    {
      "ref": "pkg:maven/org.acme/web-framework@1.0.0",
      "dependsOn": [
        "pkg:maven/org.acme/common-util@3.0.0"
      ]
    },
    {
      "ref": "pkg:maven/org.acme/persistence@3.1.0",
      "dependsOn": [
        "pkg:maven/org.acme/common-util@3.0.0"
      ]
    },
    {
      "ref": "pkg:maven/org.acme/common-util@3.0.0",
      "dependsOn": []
    }
  ]

https://cyclonedx.org/use-cases/#dependency-graph

This format is possible to have cyclic dependencies.

Also, this change may not be a win for Trivy. Quoting Tom Fay:

We can convert the new map format into an array, so it doesn't matter

Shnatsel commented 2 years ago

In that case I'll keep the old format that allows cyclic dependencies, since it's a lot more compact, and appear to be easier to map to existing structures.

Thank you for your input!