oss-review-toolkit / ort

A suite of tools to automate software compliance checks.
https://oss-review-toolkit.org
Apache License 2.0
1.56k stars 306 forks source link

Improve resolution of Python / PIP dependencies #4637

Closed sschuberth closed 1 year ago

sschuberth commented 2 years ago

ORT's analyzer has various problems with resolving Python / PIP dependencies

sschuberth commented 2 years ago

Possible solution to the above include @pombredanne's proposal for an ACT-funded "Project-Multi Python-version dependencies resolver", or leveraging / extending existing tools like https://github.com/ddelange/pipgrip.

sschuberth commented 2 years ago

or leveraging / extending existing tools like https://github.com/ddelange/pipgrip.

See in particular https://github.com/ddelange/pipgrip/issues/40.

sschuberth commented 2 years ago

Also maybe worth a look as a helper tool is https://github.com/trailofbits/it-depends which claims to

Finds native dependencies for high level languages like Python

pombredanne commented 2 years ago

@sschuberth re:

Also maybe worth a look as a helper tool is https://github.com/trailofbits/it-depends which claims to

Finds native dependencies for high level languages like Python

From a quick look they seem to:

  1. create a docker image in https://github.com/trailofbits/it-depends/blob/8f8988330239c6d3eb39f05988fdbe6802f4bbbe/it_depends/pip.py#L35
  2. run pip directly https://github.com/trailofbits/it-depends/blob/8f8988330239c6d3eb39f05988fdbe6802f4bbbe/it_depends/pip.py#L176 or through https://github.com/wimglenn/johnnydep/blob/master/johnnydep/pipper.py
sschuberth commented 2 years ago

Also see the difficulties in finding Python 2 example projects.

sschuberth commented 2 years ago

We could also take a deeper look at component-detection's approach for PIP.

sschuberth commented 2 years ago

Some interesting insights on the general topic from a Python maintainer, and a possible solution.

sschuberth commented 2 years ago

And yet another interesting discussion with links to:

pombredanne commented 2 years ago

@sschuberth FWIW, ScanCode does parse requirements files, setup.py, setup.cfg, pyproject.toml, Pipfile and Pipfile.lock and a few more and has what is likely the best requirements parser around https://github.com/nexB/pip-requirements-parser also used in CycloneDX. You can see the code in action in https://github.com/nexB/scancode-toolkit/blob/syspacfiles/src/packagedcode/pypi.py We also parse various Python metadata files and detect packages in various installed, archive and extracted layouts. We maintain https://github.com/nexB/dparse2 and https://github.com/nexB/pkginfo2 for additional manifest formats and https://github.com/nexB/univers to parse all versions including all Python package versions. We also built utilities to resolve, collect and download actual package archives based on these. And we are continuously adding support for new formats as they come.

sschuberth commented 2 years ago

ScanCode does parse requirements files, setup.py, setup.cfg, pyproject.toml, Pipfile and Pipfile.lock and a few more

Can you clarify on what "parse" means here exactly? I assume in the context of ScanCode only declared license data is parsed, but not declared direct and implied transitive dependencies, incl. resolution of version ranges to concrete versions. Correct?

pombredanne commented 2 years ago

Can you clarify on what "parse" means here exactly? I assume in the context of ScanCode only declared license data is parsed, but not declared direct and implied transitive dependencies, incl. resolution of version ranges to concrete versions. Correct?

By parse I mean collecting the data as they are and found locally without making any network call, e.g. this means:

This does not mean resolving dependencies and getting extra data for these dependencies yet: for Python and PyPI proper that's been the essence of the proposal I had put forward to the ACT project.

Now this will eventually happen as all parts are mostly in place now:

The last step will be to bring these together: as it is, this could already be used to resolve transitive dependencies using a simple strategy such as getting the latest version. It would later benefit from adding extra version resolvers to emulate the behaviour of package managers such the pip solver (this was the ACT proposal), the pubgrub solver, the maven solver, etc.

pombredanne commented 2 years ago

See also: https://github.com/oss-review-toolkit/ort/issues/3671#issuecomment-1203248523

Some updates that are likely relevant here: https://github.com/nexB/python-inspector is now out and has been designed specifically to be integrated in ort and resolve pip dependencies without having the constraints of running pip. And see https://github.com/nexB/ort/pull/1 for the working ort integration that we are refining there first before submitting to ort proper

python-inspector does resolve transitive dependencies.