Feedback on improving documentation

mdeicas commented 11 months ago

Opening this issue to discuss and collect feedback on what areas of Guac need improved documentation. Feel free to comment with any shortcomings you've encountered in the documentation.

Some possible areas for new / better documentation are:

Better status indicators
- A high level status page of what development work is currently being done in Guac
- Better documentation on the status of the supported backends, the status of supported collectors, the status of supported document types.
- A list of known bugs and limitations
How to use the APIs
A more user-friendly summary of what features are supported or what new features have been added in each Guac release.
Better documentation on Guac’s overall architecture. Like https://docs.guac.sh/guac-components/ in more detail.
- This could also explain how to run Guac via individual binaries instead of with the docker compose or helm charts.
- Better explanation on the logical role of each binary and what it does
Update the readme in https://github.com/guacsec/guac/tree/main/pkg/assembler/backends

pxp928 commented 11 months ago

ridhoq commented 11 months ago

One topic I would like to see better documentation around is the difference between the concept of a backend versus an assembler.

mihaimaruseac commented 11 months ago

One topic I would like to see better documentation around is the difference between the concept of a backend versus an assembler.

Fair point. Basically, these are at different ends of the pipeline. We are using assemblers to extract information from supply chain documents (SLSA, etc.) and create data structs to be passed into GraphQL, using a common format.

On the other side, the database part of the pipeline needs to implement writing to the database (or reading is user query) for each GraphQL mutation (query). Each database we support is implemented as a backend, though actually there might be multiple databases supported by the same backend.

The ingestion pipeline looks something like:

flowchart LR
  SLSA_doc1 --> SLSA_assembler;
  SLSA_doc2 --> SLSA_assembler;
  SBOM_doc1 --> SBOM_assembler;
  SBOM_doc2 --> SBOM_assembler;
  SBOM_doc3 --> SBOM_assembler;
  SLSA_assembler --> gqlm[[GraphQL server]];
  SBOM_assembler --> gqlm;
  gqlm --> backend1([in memory backend]);
  gqlm --> backend2([Ent backend]);
  gqlm --> backend3([Neo4J backend]);
  gqlm --> backend4([ArangoDB backend]);
  backend2 --> postgress[(Postgress)];
  backend2 --> sqlite[(SQLite)];
  backend3 --> neo4j[(Neo4J)];
  backend4 --> arango[(ArangoDB)];

The query from users will hit the GraphQL server and receive the results from there. What's on the right of the server are backends.

ridhoq commented 11 months ago

One topic I would like to see better documentation around is the difference between the concept of a backend versus an assembler.

Fair point. Basically, these are at different ends of the pipeline. We are using assemblers to extract information from supply chain documents (SLSA, etc.) and create data structs to be passed into GraphQL, using a common format.

On the other side, the database part of the pipeline needs to implement writing to the database (or reading is user query) for each GraphQL mutation (query). Each database we support is implemented as a backend, though actually there might be multiple databases supported by the same backend.

The ingestion pipeline looks something like:
flowchart LR
  SLSA_doc1 --> SLSA_assembler;
  SLSA_doc2 --> SLSA_assembler;
  SBOM_doc1 --> SBOM_assembler;
  SBOM_doc2 --> SBOM_assembler;
  SBOM_doc3 --> SBOM_assembler;
  SLSA_assembler --> gqlm[[GraphQL server]];
  SBOM_assembler --> gqlm;
  gqlm --> backend1([in memory backend]);
  gqlm --> backend2([Ent backend]);
  gqlm --> backend3([Neo4J backend]);
  gqlm --> backend4([ArangoDB backend]);
  backend2 --> postgress[(Postgress)];
  backend2 --> sqlite[(SQLite)];
  backend3 --> neo4j[(Neo4J)];
  backend4 --> arango[(ArangoDB)];
The query from users will hit the GraphQL server and receive the results from there. What's on the right of the server are backends.

This is helpful! FWIW, I did end up coming to this understanding, but it was only after reading the code. It would be great to include this diagram in the docs. It probably merits some discussion in the GUAC components page as well

ridhoq commented 10 months ago

Another request: It would be great to have a more in-depth document on how the topological queries work and some examples of inputs to the queries and sample outputs. Specifically these two queries:

neighbors(node: ID!, usingOnly: [Edge!]!): [Node!]!
path(subject: ID!, target: ID!, maxPathLength: Int!, usingOnly: [Edge!]!): [Node!]!

From my understanding, many GUAC query use cases involve neighbors and path queries so it would be beneficial to the project to cover these in more depth. The only document I could find on this topic was the topological definitions section of the GraphQL doc but please feel free to point me towards another doc that I might have missed. Thanks!

pxp928 commented 10 months ago

Need to also add documentation for the filtering via the graphQL directive: https://github.com/guacsec/guac/issues/1615

guacsec / guac

Feedback on improving documentation #1530