Closed sschuberth closed 1 year ago
Possible solution to the above include @pombredanne's proposal for an ACT-funded "Project-Multi Python-version dependencies resolver", or leveraging / extending existing tools like https://github.com/ddelange/pipgrip.
or leveraging / extending existing tools like https://github.com/ddelange/pipgrip.
See in particular https://github.com/ddelange/pipgrip/issues/40.
Also maybe worth a look as a helper tool is https://github.com/trailofbits/it-depends which claims to
Finds native dependencies for high level languages like Python
@sschuberth re:
Also maybe worth a look as a helper tool is https://github.com/trailofbits/it-depends which claims to
Finds native dependencies for high level languages like Python
From a quick look they seem to:
Also see the difficulties in finding Python 2 example projects.
We could also take a deeper look at component-detection's approach for PIP.
Some interesting insights on the general topic from a Python maintainer, and a possible solution.
And yet another interesting discussion with links to:
@sschuberth FWIW, ScanCode does parse requirements files, setup.py, setup.cfg, pyproject.toml, Pipfile and Pipfile.lock and a few more and has what is likely the best requirements parser around https://github.com/nexB/pip-requirements-parser also used in CycloneDX. You can see the code in action in https://github.com/nexB/scancode-toolkit/blob/syspacfiles/src/packagedcode/pypi.py We also parse various Python metadata files and detect packages in various installed, archive and extracted layouts. We maintain https://github.com/nexB/dparse2 and https://github.com/nexB/pkginfo2 for additional manifest formats and https://github.com/nexB/univers to parse all versions including all Python package versions. We also built utilities to resolve, collect and download actual package archives based on these. And we are continuously adding support for new formats as they come.
ScanCode does parse requirements files, setup.py, setup.cfg, pyproject.toml, Pipfile and Pipfile.lock and a few more
Can you clarify on what "parse" means here exactly? I assume in the context of ScanCode only declared license data is parsed, but not declared direct and implied transitive dependencies, incl. resolution of version ranges to concrete versions. Correct?
Can you clarify on what "parse" means here exactly? I assume in the context of ScanCode only declared license data is parsed, but not declared direct and implied transitive dependencies, incl. resolution of version ranges to concrete versions. Correct?
By parse I mean collecting the data as they are and found locally without making any network call, e.g. this means:
This does not mean resolving dependencies and getting extra data for these dependencies yet: for Python and PyPI proper that's been the essence of the proposal I had put forward to the ACT project.
Now this will eventually happen as all parts are mostly in place now:
The last step will be to bring these together: as it is, this could already be used to resolve transitive dependencies using a simple strategy such as getting the latest version. It would later benefit from adding extra version resolvers to emulate the behaviour of package managers such the pip solver (this was the ACT proposal), the pubgrub solver, the maven solver, etc.
See also: https://github.com/oss-review-toolkit/ort/issues/3671#issuecomment-1203248523
Some updates that are likely relevant here: https://github.com/nexB/python-inspector is now out and has been designed specifically to be integrated in ort and resolve pip dependencies without having the constraints of running pip. And see https://github.com/nexB/ort/pull/1 for the working ort integration that we are refining there first before submitting to ort proper
python-inspector does resolve transitive dependencies.
ORT's analyzer has various problems with resolving Python / PIP dependencies