pypa / pip-audit

Audits Python environments, requirements files and dependency trees for known security vulnerabilities, and can automatically fix them
https://pypi.org/project/pip-audit/
Apache License 2.0
960 stars 63 forks source link

Make more information available in the reports #207

Open StefanFl opened 2 years ago

StefanFl commented 2 years ago

Is your feature request related to a problem? Please describe.

The Python Packaging Advisory Database does contain more information than currently available in pip-audit's reports. In particular there are the references and the aliases.

Describe the solution you'd like

References and aliases should be available in the JSON and other reports. I am currently writing a parser to import pip-audit reports in DefectDojo (https://github.com/DefectDojo/django-DefectDojo) and it would be great if users could see as much information as possible there to be able to assess the vulnerability.

A severity would be great as well, but that doesn't seem to be part of the information in the database.

Describe alternatives you've considered

The only alternative I see is to include a reference to the vulnerability in the Python Packaging Advisory Database in DefectDojo findings, but it would be more convenient for users to get the information directly.

woodruffw commented 2 years ago

Thanks for the request! I'm aware of the aliases key in the JSON response provided by the PyPA advisory DB, but not the "references" you mentioned. Are you talking about the link key?

Either way, this is indeed something we can and should support in the JSON output format for pip-audit.

StefanFl commented 2 years ago

Thanks for your quick response! I found the references in the yaml files of the PyPA advisory DB, e.g. https://github.com/pypa/advisory-db/blob/8aa52f490ff7a87026814b5634808f5824d4018a/vulns/aiohttp-session/PYSEC-2018-35.yaml#L41. Don't know how they are called in their JSON response.

woodruffw commented 2 years ago

Gotcha, thanks for the link. It looks like that key isn't currently exposed via PyPA's JSON API.

For example, here's the vulnerability object produced by requesting aiohttp-session==2.6.0:

  "vulnerabilities": [
    {
      "aliases": [
        "CVE-2018-1000814"
      ],
      "details": "aio-libs aiohttp-session version 2.6.0 and earlier contains a Other/Unknown vulnerability in EncryptedCookieStorage and NaClCookieStorage that can result in Non-expiring sessions / Infinite lifespan. This attack appear to be exploitable via Recreation of a cookie post-expiry with the same value.",
      "fixed_in": [
        "2.7.0"
      ],
      "id": "PYSEC-2018-35",
      "link": "https://osv.dev/vulnerability/PYSEC-2018-35",
      "source": "osv"
    }
  ]

So this will need upstream changes to Warehouse first.

StefanFl commented 2 years ago

How do you query the API? Using

curl -X POST -d \
          '{"version": "2.6.0", "package": {"name": "aiohttp-session", "ecosystem": "PyPI"}}' \
          "https://api.osv.dev/v1/query"

I get this result, including the references:

{
    "vulns": [
        {
            "id": "PYSEC-2018-35",
            "details": "aio-libs aiohttp-session version 2.6.0 and earlier contains a Other/Unknown vulnerability in EncryptedCookieStorage and NaClCookieStorage that can result in Non-expiring sessions / Infinite lifespan. This attack appear to be exploitable via Recreation of a cookie post-expiry with the same value.",
            "aliases": [
                "CVE-2018-1000814",
                "GHSA-mr4x-c4v9-x729"
            ],
            "modified": "2021-07-02T02:41:32.834524Z",
            "published": "2018-12-20T15:29:00Z",
            "references": [
                {
                    "type": "WEB",
                    "url": "https://github.com/aio-libs/aiohttp-session/pull/331"
                },
                {
                    "type": "REPORT",
                    "url": "https://github.com/aio-libs/aiohttp-session/issues/325"
                },
                {
                    "type": "ADVISORY",
                    "url": "https://github.com/advisories/GHSA-mr4x-c4v9-x729"
                }
            ],
            "affected": [
                {
                    "package": {
                        "name": "aiohttp-session",
                        "ecosystem": "PyPI",
                        "purl": "pkg:pypi:aiohttp-session"
                    },
                    "ranges": [
                        {
                            "type": "ECOSYSTEM",
                            "events": [
                                {
                                    "introduced": "0"
                                },
                                {
                                    "fixed": "2.7.0"
                                }
                            ]
                        }
                    ],
                    "versions": [
                        "0.0.1",
                        "0.1.0",
                        "0.1.1",
                        "0.1.2",
                        "0.2.0",
                        "0.3.0",
                        "0.4.0",
                        "0.5.0",
                        "0.7.0",
                        "0.7.1",
                        "0.8.0",
                        "1.0.0",
                        "1.0.1",
                        "1.1.0",
                        "1.2.0",
                        "1.2.1",
                        "2.0.0",
                        "2.0.1",
                        "2.1.0",
                        "2.2.0",
                        "2.3.0",
                        "2.4.0",
                        "2.5.1",
                        "2.6.0"
                    ],
                    "database_specific": {
                        "source": "https://github.com/pypa/advisory-db/blob/main/vulns/aiohttp-session/PYSEC-2018-35.yaml"
                    }
                }
            ]
        }
    ]
}

It does contain information about the version where the vulnerability has been fixed as well, which would be very useful.

tetsuo-cpp commented 2 years ago

Hey @StefanFl, I believe the example that @woodruffw posted was a response from the PyPI API here. pip-audit can query for vulnerabilities from either the PyPI or OSV APIs via the -s flag so in order to support this, both APIs will have to expose the references key (as you noticed, OSV already exposes this). So that's why this will require a patch to Warehouse.

It does contain information about the version where the vulnerability has been fixed as well, which would be very useful.

I believe pip-audit should already be showing fix versions for each vulnerability.

StefanFl commented 2 years ago

Of course the fix versions are already there, my fault.

Having the aliases and the link would already be a great start.

bestis commented 1 year ago

I find it puzzling that there's no severity information and one can't decide what severity issue is unacceptable.

What I understood this is because neither osv or pypi provides severity information? Which I find even more puzzling.

But these has links to advisories that has severity information, for example: https://github.com/advisories/GHSA-mr4x-c4v9-x729 https://github.com/advisories/GHSA-r9hx-vwmv-q579

Why not download advisories to get the severity? And if not everything has it, then just assume worst, but would be much nicer to know the severity and even be able to skip low severity issues.

woodruffw commented 1 year ago

I find it puzzling that there's no severity information and one can't decide what severity issue is unacceptable.

First, as a gentle reminder: this project is not in control of any vulnerability reports. All pip-audit does is consume vulnerability APIs and match them against dependency lists; if you want severity metadata, then you should raise a feature request upstream with either OSV or PYSEC.

Matching against GHSA would work when the vulnerabilities in question are in the GHSA DB, but there's no formal guarantee of this: we support IDs from all kinds of vulnerability DBs, and in fact don't even prefer GHSA by default (we prefer PYSEC, since it's curated for the Python community's needs). Attempting to "merge" results from separate feeds also poses problems: these kinds of reports are updated surprisingly frequently, and merging means turning a simple data retrieval problem into a deconflicting/matching problem between two potentially contradictory reports.

TL;DR: If you want severity metadata, please work with our upstreams! It's something we won't be able to accomplish on our own.

And if not everything has it, then just assume worst, but would be much nicer to know the severity and even be able to skip low severity issues.

Independent of the above: I want to advise against taking this kind of approach (and state that we'll probably never assume the worst):

  1. Vulnerability scoring is hard, and context sensitive: reducing it to a single number from an API means that you lack the context for how exploitable it is within your codebase, which is the metric that determines actual priority. In other words: something that's scored as a 2 might actually be a 9 in your code, while something scored a 9 might be a 2 in my code.
  2. Tools like pip-audit need to be very careful to avoid producing security fatigue, and assuming the worst is fatigue-inducing: users quickly learn that "worst" really means "not so bad," and they begin to ignore things they shouldn't.
bestis commented 1 year ago

I compare the this similar audit's in other languages that usually seem to have this severity thing implemented eg. ci-audit in node. Why OSV or PYSEC doesn't feel that is a irrelevant data is the puzzling thing. And I was not blaming this project about it, I just find it puzzling that they don't have that information.

It might be hard, but all the ReDoS are not so hard. Eg. the PYSEC-2022-42969 you know very well.

Having to need to skip vulns like that, is not nice either. Generally lately there has been so much of these ReDoS ones which usually are pretty pointless like that one. I think vulnerabilities like this blocking the pipelines causes fatigue.

woodruffw commented 1 year ago

Why OSV or PYSEC doesn't feel that is a irrelevant data is the puzzling thing. And I was not blaming this project about it, I just find it puzzling that they don't have that information.

Please raise it with them! If we can get this data in a consistent matter, we will consider exposing it.

Having to need to skip vulns like that, is not nice either. Generally lately there has been so much of these ReDoS ones which usually are pretty pointless like that one. I think vulnerabilities like this blocking the pipelines causes fatigue

I agree completely. That being said, I don't think that severities solve the problem here: https://github.com/advisories/GHSA-r9hx-vwmv-q579 for example has a score of 7.5, despite having basically no attack profile/value. The incentives here are what's broken: if pip-audt provides scores, then the people who spam feeds with ReDoS vulnerabilities will just raise their scores to get them in front of more eyes.

The correct (IMO) approach here is to have curated feeds, with vulnerabilities that get removed (or aggressively pre-filtered) by trusted maintainers. I believe PYSEC attempts to provide this, although it's also an open question as to how best to scale it.

bestis commented 1 year ago

Oh, didn't notice it had so huge value. Probably because the attack vector is network. Which is then like, yeah, technically. Maybe when there should be ability to skip vulns with mentioned keywords :trollface:

I'm guessing the severities would at least lessen the effect and the less severe problems could be fixed once and while, not like immediately, but as the ecosystem seems to be what it is, here we are.

Of course one could then take the output of pip-audit and crawl the vulns to find out severity to then decide is it a blocker or not, but granted it would be better if the data would just be available from those. Need to think do I have energy to find right places to nag about it, but thank you for confirming that the problem is those curated lists pip-audit uses and I understand if pip-audit doesn't want to start crawling those itself.

woodruffw commented 1 year ago

Oh, didn't notice it had so huge value. Probably because the attack vector is network. Which is then like, yeah, technically. Maybe when there should be ability to skip vulns with mentioned keywords :trollface:

Yeah, this gets to the core of it: these scoring schemes have dimensions like "network," when the network context here is "a package index that you probably already trust and can send you ZIP bombs anyways."

I think filtering by keyword is probably a good idea here, but IMO is best done by downstream users of pip-audit via the --format=json output: supporting it directly would mean needing to decide whether to support regexes, what subset to support, etc. Users will probably all want slightly different things, so punting to them makes sense to me 🙂

Need to think do I have energy to find right places to nag about it, but thank you for confirming that the problem is those curated lists pip-audit uses and I understand if pip-audit doesn't want to start crawling those itself.

Just in case it helps: the right OSV issue tracker is probably this one: https://github.com/google/osv.dev/issues

woodruffw commented 1 year ago

654 concerns vulnerability ratings/scores specifically, so I'll break that part of this discussion into that issue.