guacsec / guac

GUAC aggregates software security metadata into a high fidelity graph database.
https://guac.sh
Apache License 2.0
1.25k stars 164 forks source link

[feature] Continuous, unattended Google Cloud Storage collection #1005

Open migmartri opened 1 year ago

migmartri commented 1 year ago

Hi,

In this PR https://github.com/guacsec/guac/pull/989, we exposed the GCS collector via the guacone CLI, this means that an user can on-demand collect SBOMs and other pieces of metadata form a GCS bucket.

This issue is about being able to configure such process but in such as way that is run periodically and unattended.

Describe the solution you'd like

I want to be able to configure Guac with tuples of bucket + credentials that the system could use to fetch periodically data from those data sources.

Describe alternatives you've considered

I've considered using guacone itself with a cron-like daemon, but I wanted to explore if this could become a first-class feature, since some of the foundations seems to be there (oci+git datasources)

Additional context

Our goal is to allow Chainloop users to be able to send SBOMs end to end automatically.

The first leg of the journey (CI -> GCS bucket) is fully automated but the last leg (GCS -> Guac) requires manual intervention via guacone collect #989. And it is this last leg what we want to automate too.

Untitled-2022-12-20-1126

Note: it might be possible that this feature might exist already and I am just not able to figure out how to configure it.

Thanks!

Refs https://github.com/chainloop-dev/chainloop/issues/209

lumjjb commented 1 year ago

Ah yes - we have collectors that can run as daemons - which I believe should do exactly what you're asking for.

We have this being done for files, would something like this work? https://github.com/guacsec/guac/blob/main/cmd/guaccollect/cmd/files.go

$ bin/guaccollect files --help
take a folder of files and create a GUAC graph utilizing Nats pubsub

Usage:
  guaccollect files [flags] file_path

Flags:
  -h, --help   help for files

Global Flags:
      --csub-addr string   address to connect to collect-sub service (default "localhost:2782")
      --nats-addr string   address to connect to NATs Server (default "nats://127.0.0.1:4222")
      --service-poll       sets the collector or certifier to polling mode (default true)
      --use-csub           use collectsub server for datasource (default true)

The only one caveat about this (for now) is there's a current known issue for large document files #731, which I am currently working on in the coming weeks.

pxp928 commented 1 year ago

+1 to @lumjjb, the GCS (and all the other collectors) are already set up to do polling to fetch periodically.