guacsec / guac

GUAC aggregates software security metadata into a high fidelity graph database.
https://guac.sh
Apache License 2.0
1.29k stars 176 forks source link

Feedback on improving documentation #1530

Open mdeicas opened 11 months ago

mdeicas commented 11 months ago

Opening this issue to discuss and collect feedback on what areas of Guac need improved documentation. Feel free to comment with any shortcomings you've encountered in the documentation.

Some possible areas for new / better documentation are:

pxp928 commented 11 months ago

related issue https://github.com/guacsec/guac/issues/1368

ridhoq commented 11 months ago

One topic I would like to see better documentation around is the difference between the concept of a backend versus an assembler.

mihaimaruseac commented 11 months ago

One topic I would like to see better documentation around is the difference between the concept of a backend versus an assembler.

Fair point. Basically, these are at different ends of the pipeline. We are using assemblers to extract information from supply chain documents (SLSA, etc.) and create data structs to be passed into GraphQL, using a common format.

On the other side, the database part of the pipeline needs to implement writing to the database (or reading is user query) for each GraphQL mutation (query). Each database we support is implemented as a backend, though actually there might be multiple databases supported by the same backend.

The ingestion pipeline looks something like:

flowchart LR
  SLSA_doc1 --> SLSA_assembler;
  SLSA_doc2 --> SLSA_assembler;
  SBOM_doc1 --> SBOM_assembler;
  SBOM_doc2 --> SBOM_assembler;
  SBOM_doc3 --> SBOM_assembler;
  SLSA_assembler --> gqlm[[GraphQL server]];
  SBOM_assembler --> gqlm;
  gqlm --> backend1([in memory backend]);
  gqlm --> backend2([Ent backend]);
  gqlm --> backend3([Neo4J backend]);
  gqlm --> backend4([ArangoDB backend]);
  backend2 --> postgress[(Postgress)];
  backend2 --> sqlite[(SQLite)];
  backend3 --> neo4j[(Neo4J)];
  backend4 --> arango[(ArangoDB)];

The query from users will hit the GraphQL server and receive the results from there. What's on the right of the server are backends.

ridhoq commented 11 months ago

One topic I would like to see better documentation around is the difference between the concept of a backend versus an assembler.

Fair point. Basically, these are at different ends of the pipeline. We are using assemblers to extract information from supply chain documents (SLSA, etc.) and create data structs to be passed into GraphQL, using a common format.

On the other side, the database part of the pipeline needs to implement writing to the database (or reading is user query) for each GraphQL mutation (query). Each database we support is implemented as a backend, though actually there might be multiple databases supported by the same backend.

The ingestion pipeline looks something like:

flowchart LR
  SLSA_doc1 --> SLSA_assembler;
  SLSA_doc2 --> SLSA_assembler;
  SBOM_doc1 --> SBOM_assembler;
  SBOM_doc2 --> SBOM_assembler;
  SBOM_doc3 --> SBOM_assembler;
  SLSA_assembler --> gqlm[[GraphQL server]];
  SBOM_assembler --> gqlm;
  gqlm --> backend1([in memory backend]);
  gqlm --> backend2([Ent backend]);
  gqlm --> backend3([Neo4J backend]);
  gqlm --> backend4([ArangoDB backend]);
  backend2 --> postgress[(Postgress)];
  backend2 --> sqlite[(SQLite)];
  backend3 --> neo4j[(Neo4J)];
  backend4 --> arango[(ArangoDB)];

The query from users will hit the GraphQL server and receive the results from there. What's on the right of the server are backends.

This is helpful! FWIW, I did end up coming to this understanding, but it was only after reading the code. It would be great to include this diagram in the docs. It probably merits some discussion in the GUAC components page as well

ridhoq commented 10 months ago

Another request: It would be great to have a more in-depth document on how the topological queries work and some examples of inputs to the queries and sample outputs. Specifically these two queries:

neighbors(node: ID!, usingOnly: [Edge!]!): [Node!]!
path(subject: ID!, target: ID!, maxPathLength: Int!, usingOnly: [Edge!]!): [Node!]!

From my understanding, many GUAC query use cases involve neighbors and path queries so it would be beneficial to the project to cover these in more depth. The only document I could find on this topic was the topological definitions section of the GraphQL doc but please feel free to point me towards another doc that I might have missed. Thanks!

pxp928 commented 10 months ago

Need to also add documentation for the filtering via the graphQL directive: https://github.com/guacsec/guac/issues/1615