guacsec / guac

GUAC aggregates software security metadata into a high fidelity graph database.
https://guac.sh
Apache License 2.0
1.29k stars 179 forks source link

[ingestion bug] Bulk Ingestion does not fail when a predicate fails #1812

Open pxp928 opened 8 months ago

pxp928 commented 8 months ago

Describe the bug Currently when using bulk ingestion, when part of the ingestion of a document fails it is not reported. This should error and be ingested again (placed back in the queue).

In the example below both the occurrences and hasSBOM failed, this should be reported as a failure to ingest the document:

time go run ./cmd/guacone collect files ../guac-data/docs/spdx/spdx_vuln.json
{"level":"info","ts":1712269407.5400379,"caller":"logging/logger.go:70","msg":"Logging at info level"}
{"level":"info","ts":1712269407.541861,"caller":"process/process.go:254","msg":"Decoding document with encoding:  "}
{"level":"info","ts":1712269407.764471,"caller":"parser/parser.go:81","msg":"parsing document tree with root type: SPDX"}
{"level":"info","ts":1712269407.87801,"caller":"ingestor/ingestor.go:59","msg":"unable to create entries in collectsub server, but continuing: unable to add collect entries: rpc error: code = Unavailable desc = connection error: desc = \"transport: Error while dialing: dial tcp [::1]:2782: connect: connection refused\""}
{"level":"info","ts":1712269407.891953,"caller":"helpers/bulk.go:39","msg":"assembling Package: 3981"}
{"level":"info","ts":1712269408.0499442,"caller":"helpers/bulk.go:55","msg":"assembling Source: 0"}
{"level":"info","ts":1712269408.0545542,"caller":"helpers/bulk.go:65","msg":"assembling Artifact: 3384"}
{"level":"info","ts":1712269408.097975,"caller":"helpers/bulk.go:80","msg":"assembling Materials (Artifact): 0"}
{"level":"info","ts":1712269408.100332,"caller":"helpers/bulk.go:89","msg":"assembling Builder: 0"}
{"level":"info","ts":1712269408.101753,"caller":"helpers/bulk.go:98","msg":"assembling Vulnerability: 0"}
{"level":"info","ts":1712269408.1031601,"caller":"helpers/bulk.go:107","msg":"assembling Licenses: 29"}
{"level":"info","ts":1712269408.106025,"caller":"helpers/bulk.go:114","msg":"assembling CertifyScorecard: 0"}
{"level":"info","ts":1712269408.106059,"caller":"helpers/bulk.go:119","msg":"assembling IsDependency: 7161"}
{"level":"info","ts":1712269408.4749792,"caller":"helpers/bulk.go:127","msg":"assembling IsOccurrence: 3878"}
{"level":"error","ts":1712269408.6642299,"caller":"helpers/bulk.go:130","msg":"ingestIsOccurrences failed with error: isOccurrencesPkg failed with error: input: ingestOccurrences IngestOccurrences :: bulk upsert Occurrence node: ent: constraint failed: insert nodes to table \"occurrences\": pq: insert or update on table \"occurrences\" violates foreign key constraint \"occurrences_artifacts_artifact\"\n","stacktrace":"github.com/guacsec/guac/pkg/assembler/clients/helpers.GetBulkAssembler.func1\n\t/Users/parth/Documents/pxp928/artifact-ff/pkg/assembler/clients/helpers/bulk.go:130\ngithub.com/guacsec/guac/pkg/ingestor.Ingest\n\t/Users/parth/Documents/pxp928/artifact-ff/pkg/ingestor/ingestor.go:62\ngithub.com/guacsec/guac/cmd/guacone/cmd.init.func6.1\n\t/Users/parth/Documents/pxp928/artifact-ff/cmd/guacone/cmd/files.go:125\ngithub.com/guacsec/guac/pkg/handler/collector.Collect\n\t/Users/parth/Documents/pxp928/artifact-ff/pkg/handler/collector/collector.go:97\ngithub.com/guacsec/guac/cmd/guacone/cmd.init.func6\n\t/Users/parth/Documents/pxp928/artifact-ff/cmd/guacone/cmd/files.go:145\ngithub.com/spf13/cobra.(*Command).execute\n\t/Users/parth/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:987\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/Users/parth/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/Users/parth/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\ngithub.com/guacsec/guac/cmd/guacone/cmd.Execute\n\t/Users/parth/Documents/pxp928/artifact-ff/cmd/guacone/cmd/root.go:56\nmain.main\n\t/Users/parth/Documents/pxp928/artifact-ff/cmd/guacone/main.go:23\nruntime.main\n\t/opt/homebrew/Cellar/go/1.22.1/libexec/src/runtime/proc.go:271"}
{"level":"info","ts":1712269408.664423,"caller":"helpers/bulk.go:135","msg":"assembling HasSLSA: 0"}
{"level":"info","ts":1712269408.6644301,"caller":"helpers/bulk.go:140","msg":"assembling CertifyVuln: 0"}
{"level":"info","ts":1712269408.664434,"caller":"helpers/bulk.go:145","msg":"assembling VulnMetadata: 0"}
{"level":"info","ts":1712269408.664439,"caller":"helpers/bulk.go:150","msg":"assembling VulnEqual: 0"}
{"level":"info","ts":1712269408.664444,"caller":"helpers/bulk.go:156","msg":"assembling HasSourceAt: 0"}
{"level":"info","ts":1712269408.664448,"caller":"helpers/bulk.go:161","msg":"assembling CertifyBad: 0"}
{"level":"info","ts":1712269408.664466,"caller":"helpers/bulk.go:167","msg":"assembling CertifyGood: 0"}
{"level":"info","ts":1712269408.6644711,"caller":"helpers/bulk.go:173","msg":"assembling PointOfContact: 0"}
{"level":"info","ts":1712269408.664474,"caller":"helpers/bulk.go:178","msg":"assembling HasMetadata: 462"}
{"level":"info","ts":1712269408.703561,"caller":"helpers/bulk.go:183","msg":"assembling HasSBOM: 1"}
{"level":"error","ts":1712269408.8020039,"caller":"helpers/bulk.go:190","msg":"ingestHasSBOMs failed with error: hasSBOMPkgs failed with error: input: ingestHasSBOMs IngestHasSBOMs failed with err: input: IngestHasSbom :: input: generateSBOMCreate :: updateHasSBOMWithIncludeArtifacts: update for IncludedSoftwareArtifactIDs hasSBOM node failed with error: ent: constraint failed: add m2m edge for table bill_of_materials_included_software_artifacts: pq: insert or update on table \"bill_of_materials_included_software_artifacts\" violates foreign key constraint \"bill_of_materials_included_software_artifacts_artifact_id\"\n","stacktrace":"github.com/guacsec/guac/pkg/assembler/clients/helpers.GetBulkAssembler.func1\n\t/Users/parth/Documents/pxp928/artifact-ff/pkg/assembler/clients/helpers/bulk.go:190\ngithub.com/guacsec/guac/pkg/ingestor.Ingest\n\t/Users/parth/Documents/pxp928/artifact-ff/pkg/ingestor/ingestor.go:62\ngithub.com/guacsec/guac/cmd/guacone/cmd.init.func6.1\n\t/Users/parth/Documents/pxp928/artifact-ff/cmd/guacone/cmd/files.go:125\ngithub.com/guacsec/guac/pkg/handler/collector.Collect\n\t/Users/parth/Documents/pxp928/artifact-ff/pkg/handler/collector/collector.go:97\ngithub.com/guacsec/guac/cmd/guacone/cmd.init.func6\n\t/Users/parth/Documents/pxp928/artifact-ff/cmd/guacone/cmd/files.go:145\ngithub.com/spf13/cobra.(*Command).execute\n\t/Users/parth/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:987\ngithub.com/spf13/cobra.(*Command).ExecuteC\n\t/Users/parth/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1115\ngithub.com/spf13/cobra.(*Command).Execute\n\t/Users/parth/go/pkg/mod/github.com/spf13/cobra@v1.8.0/command.go:1039\ngithub.com/guacsec/guac/cmd/guacone/cmd.Execute\n\t/Users/parth/Documents/pxp928/artifact-ff/cmd/guacone/cmd/root.go:56\nmain.main\n\t/Users/parth/Documents/pxp928/artifact-ff/cmd/guacone/main.go:23\nruntime.main\n\t/opt/homebrew/Cellar/go/1.22.1/libexec/src/runtime/proc.go:271"}
{"level":"info","ts":1712269408.8021052,"caller":"helpers/bulk.go:193","msg":"assembling VEX : 0"}
{"level":"info","ts":1712269408.80211,"caller":"helpers/bulk.go:198","msg":"assembling HashEqual : 0"}
{"level":"info","ts":1712269408.802146,"caller":"helpers/bulk.go:203","msg":"assembling PkgEqual : 0"}
{"level":"info","ts":1712269408.802154,"caller":"helpers/bulk.go:208","msg":"assembling CertifyLegal : 125"}
{"level":"info","ts":1712269408.819148,"caller":"ingestor/ingestor.go:68","msg":"[1.277286833s] completed doc {Collector:FileCollector Source:file:///../guac-data/docs/spdx/spdx_vuln.json}"}
{"level":"info","ts":1712269408.8191738,"caller":"cmd/files.go:138","msg":"collector ended gracefully"}
{"level":"info","ts":1712269408.81918,"caller":"cmd/files.go:152","msg":"completed ingesting 1 documents of 1"}
go run ./cmd/guacone collect files ../guac-data/docs/spdx/spdx_vuln.json  2.68s user 2.52s system 133% cpu 3.884 total

Expected behavior Ingestor should only succeed if the document is fully parsed and ingested. Otherwise the document in question should error and be placed back on the queue for re-ingestion.

GUAC version main

jeffmendoza commented 7 months ago

1275

jeffmendoza commented 7 months ago

For the cli: Update this msg:

{"level":"info","ts":1712269408.81918,"caller":"cmd/files.go:152","msg":"completed ingesting 1 documents of 1"}

To either be that on success (no errors), or something like:

{"level":"error","ts":1712269408.81918,"caller":"cmd/files.go:152","msg":"2 failures and 8 successful documents ingested of 10"}
jeffmendoza commented 7 months ago

For the ingestor service: Also update the ingestor msg to fail as well:

{"level":"info","ts":1712269408.819148,"caller":"ingestor/ingestor.go:68","msg":"[1.277286833s] completed doc {Collector:FileCollector Source:file:///../guac-data/docs/spdx/spdx_vuln.json}"}

Look into enabling prometheus on ingestor and emitting errors.