oss-review-toolkit / ort

A suite of tools to automate software compliance checks.
https://oss-review-toolkit.org
Apache License 2.0
1.53k stars 300 forks source link

Improve analysis of python based projects #7964

Open netomi opened 7 months ago

netomi commented 7 months ago

I was testing out ORT on a couple of projects and noticed that it was rather slow to analyse a poetry based python project (https://github.com/netomi/otterdog). The project has a lock file so I was assuming that the dependency resolution should be rather fast.

After some debugging, it turns out that ORT calls python-inspector on the requirements file that is exported from poetry:

13:42:35.163 [DefaultDispatcher-worker-1] INFO  org.ossreviewtoolkit.utils.common.ProcessCapture - Running 'python-inspector --python-version 311 --operating-system linux --json-pdt /tmp/ort-PythonInspector5612839292643848503/python-inspector5965944598659711246.json --analyze-setup-py-insecurely --requirement /tmp/ort-Poetry1934851817638935981/requirements.txt16944187604928218322.tmp --verbose' in '/tmp/ort-Poetry1934851817638935981'...
13:43:10.194 [DefaultDispatcher-worker-1] INFO  org.ossreviewtoolkit.plugins.packagemanagers.python.Poetry - Generating 'requirements.txt7902172682984572843.tmp' file in '/home/tn/workspace/eclipse/otterdog' directory...

but as you can see from the log, it takes a while for python-inspector to resolve the dependencies although they are all pinned (from the lock file).

Digging into the code of python-inspector, I figured various performance improvements that could relatively easily be applied. I created a PR at https://github.com/nexB/python-inspector/pull/163 .

Running this version of python-inspector with ORT on the same project, leads to far better results (see timestamps):

13:45:06.460 [DefaultDispatcher-worker-1] INFO  org.ossreviewtoolkit.utils.common.ProcessCapture - Running 'python-inspector --python-version 311 --operating-system linux --json-pdt /tmp/ort-PythonInspector13905845346354005296/python-inspector9105449330839367212.json --analyze-setup-py-insecurely --requirement /tmp/ort-Poetry17813648676616128042/requirements.txt17803415363049372639.tmp --verbose' in '/tmp/ort-Poetry17813648676616128042'...
13:45:17.402 [DefaultDispatcher-worker-1] INFO  org.ossreviewtoolkit.plugins.packagemanagers.python.Poetry - Generating 'requirements.txt13258899092414060651.tmp' file in '/home/tn/workspace/eclipse/otterdog' directory...

with the exact same output. When you have multiple scopes defined in your project (in my case I have 5), this improvement can really sum up, I could bring the analysis from 2min down to 30s.

I would be happy if there is some feedback on the PR so we can get that into ORT asap, as I am currently investigating the ability of ORT running license checks automatically via GitHub actions and everything that speeds up the analysis is greatly appreciated.

sschuberth commented 7 months ago

I created a PR at nexB/python-inspector#163 .

❤️ for that!

I would be happy if there is some feedback on the PR so we can get that into ORT asap

Sure, we'll usually upgrade to newly released python-inspector versions quickly. Let's keep this issue open as a reminder to do that.