scanner runs very slow / no status

oss-review-toolkit / ort

A suite of tools to automate software compliance checks.

https://oss-review-toolkit.org

Apache License 2.0

1.57k stars 308 forks source link

scanner runs very slow / no status #3296

Closed nupis-FrankO closed 3 months ago

nupis-FrankO commented 3 years ago

Hi,

we are testing ORT at the moment to scan/analyze a maven project.

The scan runs very slow (> 1 day) and we don't see any state of the current execution. Is it possible to implement a "progress bar"?

Or is there a possibility to optimize the execution of the scan process?

Regards,

Frank

sschuberth commented 3 years ago

I have also seen bad performance for scanning large Maven projects recently, and we're planning to investigate these, as even with debug output sometimes nothing seem to happen for minutes. Meanwhile, can you try increasing the max heap size for ORT to something like 16 GB (-Xmx16g) to see if that helps?

sachinshaji commented 2 years ago

HI, We are also facing the same problem. Scanner takes lot of time.

sschuberth commented 2 years ago

The scanner (i.e. ScanCode, by default) taking a lot of time is nothing ORT can change. However, we could probably try to do a better job at reporting progress (although you actually do get per package progress with --info).

In any case, you should set up a scan storage to benefit from existing scan results. Giving the ClearlyDefined scan storage a try is something you could do, for example.

sachinshaji commented 2 years ago

Thanks for the replay. I have build a docker image form the Dockerfile given in the repo. I am triggering this command to scan the code sudo docker run -v $PWD/:/project --info scan -i /project/analyse/analyzer-result.yml -o /project/scanner/scanner-result.json

We are not defining any storage backend, i guess this will default goes to filesystem. Any suggestion to improve the scanning time? It took around 3 to 4 hours to scan the entire code.

sschuberth commented 2 years ago

We are not defining any storage backend, i guess this will default goes to filesystem.

Correct. And this only speeds up the scanning for consecutive scans that involve mostly the same packages, as you're populating our file-based scan storage yourself.

Any suggestion to improve the scanning time?

The idea is that you either quickly build up some company-internal (Postgres-based) scan storage yourself to speed up future scans, or to use an existing public scan storage, like the one from ClearlyDefined as mentioned above. However, might be that we currently have an issue there.

sachinshaji commented 2 years ago

Thanks a lot for your help

sschuberth commented 3 months ago

Closed as part of backlog grooming. Feel free to comment if you would like to contribute to this.