pypa / pip-audit

Audits Python environments, requirements files and dependency trees for known security vulnerabilities, and can automatically fix them
https://pypi.org/project/pip-audit/
Apache License 2.0
958 stars 63 forks source link

Feature: stable API to add additional VulnerabilityService instances for private repositories #805

Open davidjmemmett opened 1 month ago

davidjmemmett commented 1 month ago

Pre-submission checks

What's the problem this feature will solve?

I'm switching to pip-audit instead of safety (as safety v3 doesn't support private repositories), and I've written a wrapper around the pip-audit library to add an additional VulnerabilityService instance. In order to accomplish this, I've had to recreate the VulnerabilityServiceChoice enum and override it in my wrapper. As I've also had to import a bunch of other packages from within pip-audit, I've noticed that they're all prefixed with _, which in Python typically means private/"don't rely on this as a stable API".

Describe the solution you'd like

At some point in the future, the private API of pip-audit might change and break my wrapper (described above). A stable API to be able to add additional VulnerabilityService instances would allow organisations with private pip repositories to be able to track and flag vulnerabilities in their private repositories without fear that the tooling may suddenly break without notice.

Additional context

No response

woodruffw commented 1 month ago

Thanks for opening this @davidjmemmett!

Indeed, all of pip-audit's current APIs are considered private -- we don't make any stability guarantees around them, and users are expected to use the CLI for SemVer guarantees.

With that being said, I think we'd be interested in exposing a subset (probably a small subset, to get started) of APIs for public use, to make programmatic use easier. These kinds of use cases are valuable for us to hear about, since they inform which and how we'll expose those APIs 🙂

Providing support for custom VulnerabilityService implementations is an interesting challenge: as you've noticed, the current set is baked into CLI assumptions (like `VulnerabilityServiceChoice``) in ways that wouldn't be trivial to make dynamic (although I don't think it would be impossible). So, I'm curious if there's another layer of abstraction we could tackle this issue at:

CC @di as well, since he'll almost certainly have thoughts about the propriety/scope of a public API for pip-audit 🙂

di commented 1 month ago

Agreed. Some more details about your use case and examples of how you're using pip-audit and how you'd prefer to be using it would be really helpful!

davidjmemmett commented 1 month ago

Hi both, thank you for responding swiftly.

My current implementation is a little bit hacky, as it overrides the definition of VulnerabilityServiceChoice:

from enum import Enum
from pathlib import Path

from pip_audit import _cli as audit_cli
from pip_audit._service.interface import VulnerabilityService
from pip_audit._service.osv import OsvService
from pip_audit._service.pypi import PyPIService
from pip._vendor.typing_extensions import assert_never

from custom_pip_audit.service.custom import CustomService

class CustomVulnerabilityServiceChoice(str, Enum):

    Osv = "osv"
    Pypi = "pypi"
    Custom = "custom"

    def to_service(self, timeout: int, cache_dir: Path | None) -> VulnerabilityService:
        if self is CustomVulnerabilityServiceChoice.Custom:
            return CustomService()
        elif self is CustomVulnerabilityServiceChoice.Osv:
            return OsvService(cache_dir, timeout)
        elif self is CustomVulnerabilityServiceChoice.Pypi:
            return PyPIService(cache_dir, timeout)
        else:
            assert_never(self)  # pragma: no cover

    def __str__(self) -> str:
        return self.value

audit_cli.VulnerabilityServiceChoice = CustomVulnerabilityServiceChoice

def audit() -> None:
    audit_cli.audit()

I then override VulnerabilityService.query and parse a JSON file which we maintain manually of the format, I couldn't find any existing logic to do the version comparisons, so I wrote a whole heap of code to do that too (available upon request):

{
    "a-custom-package": [
        {
            "id": "custom-1",
            "description": "A description of the vulnerability",
            "specs": ["<1.2.3"],
            "fix_versions": ["1.2.3"]
        }
    ]
}

I would definitely be open to changing the format of the JSON file to match something official, which may already be available to export/fetch from industry standard tooling.

I hope this provides a little bit more insight into what I'm trying to achieve.

woodruffw commented 1 month ago

I couldn't find any existing logic to do the version comparisons, so I wrote a whole heap of code to do that too (available upon request):

Hmm, I think you can use the packaging.version APIs for this, unless I'm misunderstanding 🙂: https://packaging.pypa.io/en/stable/version.html

I would definitely be open to changing the format of the JSON file to match something official, which may already be available to export/fetch from industry standard tooling.

Yep! I think the OSV Schema is probably the best fit here -- we already support it via the osv service, so if your internal service/source could support that schema then we could probably make your integration simpler by allowing the API to override the OSV API endpoint.

davidjmemmett commented 1 month ago

Yeah, I was using the Version class and it's built-in parsing and comparators, I meant the logic such as:

                    ...
                    if vulnerability_spec.startswith("<="):
                        version = Version(vulnerability_spec[2:])
                        if spec.version <= version:
                            num_spec_matches += 1

                    elif vulnerability_spec.startswith("=>"):
                    ... etc

I completely agree with moving to the OSV schema being the simplest way forward, and using vanilla pip-audit, along the lines of: pip-audit --vulnerability-service osv --osv-vulnerabilities-path ./custom-vulnerability-list.json .... Would you like me to look into implementing this and submitting a GitHub Merge Request?

woodruffw commented 1 month ago

Ahh, gotcha. Yeah, I'm unfortunately not familiar with anything that's immediately a good fit for that (there might be some other packaging APIs around marker evaluation that might help, but I don't know them off the top of my head).

In terms of next steps here: supporting this via a JSON input on the CLI will make this pretty close to https://github.com/pypa/pip-audit/issues/698, so I want to make sure we get the interface right here 🙂. In particular: would it be a significant problem for your use case if we required OSV loading via a URL instead? For example:

pip-audit --vulnerability-service osv --osv-url http://your-local-service.example

That would reduce the amount of special casing we'd need to do within pip-audit itself, since the current OSV service could be redirected to your host rather than the default public one. But if this would be a significant obstacle for you, I think we can think about loading directly from JSON instead (with the caveat that we'll probably want to add some warnings/language reminding users that loading from a local source might be "stale" compared to an online service).

davidjmemmett commented 1 month ago

I don't mind at all, it saves me an extra step of fetching a remote file. Thank you.

woodruffw commented 1 month ago

No problem! I'll let @di confirm that he's okay with this approach as well, but assuming he is then I'd be happy to review a PR for this.

(I suspect there might be some hurdles, since the OSV API isn't just a static JSON document API -- it receives a posted query. But osv.py only requires a very small amount of that API surface, so it should be easy for a small internal service to replicate 🙂)

davidjmemmett commented 1 month ago

That complicates matters, but I will investigate and see if there's a way that it can accept a static file.

davidjmemmett commented 1 month ago

I have opened the Pull Request https://github.com/pypa/pip-audit/pull/810 in order to allow the OSV URL to be overridden; please let me know if there is anything further required in order for this to be merged in.