minvws / nl-kat-coordination

Repo nl-kat-coordination for minvws
European Union Public License 1.2
123 stars 56 forks source link

Advanced Octopoes queries #1292

Open Donnype opened 1 year ago

Donnype commented 1 year ago

Update From Discussions

Benny gaat nadenken over wat wel/niet gaat lukken met oa overstap naar xtdb2 en versnellen van octopoesv2. Wachten tot Benny terug is.

TODO:

Advanced Octopoes Queries

Currently, external services query XTDB through Octopoes API, while Octopoes queries XTDB directly. There are, however, several limitations to the current implementation. This issue aims to capture the current limitations and create a plan to improve query flexibility to:

User stories

The user stories could be:

  1. As a KAT user, I want to filter Objects on type-specific fields, so I can easily find what I'm interested in.
  2. As a KAT user, I want to do aggregation queries on my object graph, so I can easily report on totals, averages and counts for objects and findings.
  3. As a KAT user, I want to know both when facts where valid as well as when facts were recorded, so I can create a detailed audit trail/log for reports to clients and auditors.

Query Limitations

API Limitations

The current Octopoes API implements several HTTP endpoints:

Endpoint Methods
"/health" ["get"]
"/{client}/health" ["get"]
"/{client}/objects" ["get"]
"/{client}/objects/load_bulk" ["post"]
"/{client}/object" ["get"]
"/{client}/objects/random" ["get"]
"/{client}/" ["delete"]
"/{client}/objects/delete_many" ["post"]
"/{client}/tree" ["get"]
"/{client}/origins" ["get"]
"/{client}/origin_parameters" ["get"]
"/{client}/observations" ["post"]
"/{client}/declarations" ["post"]
"/{client}/scan_profiles" ["get", "put"]
"/{client}/scan_profiles/save_many" ["post"]
"/{client}/scan_profiles/recalculate" ["get"]
"/{client}/scan_profiles/inheritance" ["get"]
"/{client}/findings" ["get"]
"/{client}/findings/count_by_severity" ["get"]
"/{client}/node" ["post", "delete"]
"/{client}/bits/recalculate" ["post"]

In total 10 out of the 23 endpoints are dedicated to fetching one of the XTDB entities:

Every endpoint supports valid-time filtering and some have specialized filters, such as the GET objects, that you can filter on e.g. type and scan level, or GET Findings, with a filter on severity.

This poses the following issues:

  1. There is no way to filter on OOIType-specific fields with the objects endpoint. Now, you cannot find all open IpPort|80 for an organization, for example.
  2. Endpoints have to be created to do aggregations, such as for the Findings count_by_severity
  3. There are 4 ways to fetch generic OOIs
  4. There are no DELETE endpoints for several entity types
  5. On a side-note: there is no real object history exposed through the API, although even the current XTDB version does have some neat APIs already that would be interesting to expose.
  6. There is no transaction time query support yet

Possible API Solutions

To resolve issue 1. we could consider a few options:

To resolve issue 2. we could consider:

To resolve issue 3. we could probably phase out the random endpoint at some point, and somehow the tree endpoint might be a special case of the GET objects endpoint. This would also be resolved with the xtdb proxy/connection.

Resolving issue 4. is a matter of completing the implementation of one of the proposed solutions properly.


ORM Limitations

Within Octopoes, the OOIs are both saved and queries directly from XTDB. With the current setup, there are some issues given the current requirements and developments:

  1. Still a significant amount of queries are built as string interpolations, which makes it hard to create a more generic interface around query building.
  2. There is no functionality to filter OOIType's on type specific fields in the ORM either.
  3. The generate_pull_query is quite complex and still exposes quite some query complexity
  4. There is no aggregation functionality built into the ORM
  5. It is quite hard to do joins via abstract types.
  6. There is also no functionality to fetch the object's history at the ORM level.
  7. There is no transaction time query support yet

Possible ORM Solutions

Approach

Assuming we will not be considering direct connections to XTDB from Rocky we can break the user stories down into the following issues.

As a KAT user, I want to filter Objects on type-specific fields, so I can easily find what I'm interested in.

As a KAT user, I want to do aggregation queries on my object graph, so I can easily report on totals, averages and counts for objects and findings.

As a KAT user, I want to know both when facts where valid as well as when facts were recorded, so I can create a detailed audit trail/log for reports to clients and auditors.

originalsouth commented 4 months ago

Blocked. Pending #2918.