whitesource-ps / ws-nexus-integration

WhiteSource Nexus integration tool
Apache License 2.0
15 stars 8 forks source link

[FR] [ws-nexus-integration] Pull & Scan in batches #39

Closed danielnbalasoiu closed 2 years ago

danielnbalasoiu commented 2 years ago

Is your feature request related to a problem? Please describe.
It's almost impossible to scan large Nexus repositories without a huge amount of disk space and bandwidth. Context: There is a Nexus container registry which stores > 10k images. In the current implementation, a list with all containers is created, then the containers are downloaded and scanned. The problems are:

  1. listing all containers take a very long time.
  2. downloading ALL the containers consumes a lot of resources (store, bandwidth)
  3. the scanning process will take forever.

Describe the solution you'd like
A possible option would be implementing scanning in batches and then delete/clean up (see FR #38 the scanned containers.

The official UA supports scanning container images hosted in AWS ECR and supports configurable parameters:

docker.scanImages=true
docker.pull.enable=true
# --> images to be pulled from registry <--
docker.pull.images=.*alpine.*
#docker.pull.images=.*.*
# --> images to be scanned <--
docker.includes=.*mysql.*
docker.pull.tags=.*.*
docker.aws.enable=true
docker.aws.registryIds=REDACTED
# --> maximum number of container images to be pulled  <--
docker.pull.maxImages=15

Additional context
image

danielnbalasoiu commented 2 years ago

Hi! Can you pls give me an estimation when I can test a possible fix? Thanks!

danielnbalasoiu commented 2 years ago

Hi! Please let me know if you need me to test this. I'll happily provide debug information.

danielnbalasoiu commented 2 years ago

Hi @rammatzkvosky

Do you have any updates?

Thank you!

danielnbalasoiu commented 2 years ago

ping @rammatzkvosky

danielnbalasoiu commented 2 years ago

@NatalyaDalid, can you please help me with a status on this matter? Thank you!

rammatzkvosky commented 2 years ago

Hi @danielnbalasoiu , sorry for the late reply. We started working on this feature which will also cover this issue.

At the moment I still don't have an estimation when we would be able to deliver it , but the work is in progress and we will do our best to provide it in advance.

danielnbalasoiu commented 2 years ago

Hi @rammatzkvosky

This is great news! Please keep me in the loop and let me know if you need any alpha implementation feedback.

danielnbalasoiu commented 2 years ago

Hi! Any progress on this issue?

NatalyaDalid commented 2 years ago

Hi @danielnbalasoiu,

The pre-release version with the fix will be available by the end of this week. We will keep you updated.

Thanks, WS PS Team

rammatzkvosky commented 2 years ago

Hi @danielnbalasoiu ,

Please check the pre-release version : https://pypi.org/project/ws-nexus-integration/0.3a1/

In this version :

  1. Each docker image is being pulled, scanned, and removed ( unless exists prior to the run of the tool ).
  2. NexusDockerReposImagesIncludes parameter is for docker type repositories. - Comma-separated regex list of images expressions to be included. For example, NexusDockerReposImagesIncludes=.*3.12 will only pull and scan the images which end in 3.12
danielnbalasoiu commented 2 years ago

Hi,

I have tested version 0.3a1 with NexusDockerReposImagesIncludes parameter and it works as expected! 🥳 👏

In combination with ThreadCount parameter the scan time improves drastically.

rammatzkvosky commented 2 years ago

Hi @danielnbalasoiu ,

That's great! Thanks for letting us know.

I will keep this issue open until we will provide an official release.