minvws / nl-kat-coordination

OpenKAT scans networks, finds vulnerabilities and creates accessible reports. It integrates the most widely used network tools and scanning software into a modular framework, accesses external databases such as shodan, and combines the information from all these sources into clear reports. It also includes lots of cat hair.
https://openkat.nl
European Union Public License 1.2
128 stars 58 forks source link

We should use a ReportOOI entity per Report, or update a Report in-place and traverse the history API #3729

Open Donnype opened 1 month ago

Donnype commented 1 month ago

Per Report, we now create a ReportOOI:

class Report(OOI):
    object_type: Literal["Report"] = "Report"

    name: str
    report_type: str
    template: str | None = None
    date_generated: datetime

    input_oois: list[str]

    report_id: UUID

    organization_code: str
    organization_name: str
    organization_tags: list[str]
    data_raw_id: str

    observed_at: datetime
    parent_report: Reference | None = ReferenceField("Report", default=None)
    report_recipe: Reference | None = ReferenceField("ReportRecipe", default=None)
    has_parent: bool

# as a reference:
class ReportRecipe(OOI):
    object_type: Literal["ReportRecipe"] = "ReportRecipe"

    recipe_id: UUID

    report_name_format: str
    subreport_name_format: str | None = None

    input_recipe: dict[str, Any]  # can contain a query which maintains a live set of OOIs or manually picked OOIs.
    parent_report_type: str | None = None
    report_types: list[str]

    cron_expression: str

    _natural_key_attrs = ["recipe_id"]

This means that every run of a recipe, we add a new object to XTDB (and Bytes) that has its own history, which shows it as a seperate row in the current table overview.

Another option is to update a ReportOOI in-place, reducing the list but meaning we should query the history API more often. The question is if we should change the implementation to perform

Pros and Cons

Pros

Cons

(@Donnype: When I try to model for XTDB, I first ask myself if it makes sense to save an entity over time in a regular relational database, which in the case of a Report I have to answer with "Yes". Older versions of reports are too important to hide in a historic version.)

Conclusion after discussion on 25-10-2024

After a vote we concluded that we will re-use the Report OOI. We expect XTDB 2.0 to be able to resolve any use-cases that pop up and use the history API to traverse any other queries/use-cases. Both @dekkers and @underdarknl think using the history for new versions of a Report is a more adequate representation of the actual situation. (@Donnype thinks creating new objects is a more adequate representation that would also save us from potentially not being able to handle more intricate queries across reports.)

originalsouth commented 1 month ago

Give me the reports generated between 10-10-2024 and 21-10-2024" implies fetching the history of every report.

If I understand correctly, only if the result changes between the two valid_times which can be queried.

Donnype commented 2 days ago

Questions:

underdarknl commented 2 days ago

Im looking at this in a different Light I think:

An aggregate report is the result of the underlying reports being combined. We currently have the relation the wrong way round I think.. A recipe holds a list of Input OOI's, or a query (resulting in a live list of input OOI's possibly changing over time). The reporting job derived from the recipe's schedule on a given moment in time produces a set of Asset-reports (input-ooi * report-type). those reports should get a reference to the recipe that was used to create them, and a reference to the job. (no news there I suppose?). The asset-reports should get a (I think) deterministic OOI-ID, which encodes the input-ooi, the used report type, and any settings that change the actual data being stored (settings that only apply to the rendered version, but not the json dont need to be part of the hash/ooi-ID. [1] The aggregate report should now hold the underlying report-ooi's OOI-ID's and their valid-time at the time of combination as its input-OOI list. It could also hold a reference to the recipe or job that triggered the aggregation to run. If a given OOI is no longer available for the asset-reporting, we could include the most recent report or decide to not include the Asset-report anymore. The underlying report might have been generated yesterday, or might have been manually uploaded by a user, in any case we do or don't include it in the input-ooi's for the given agregate-report.

1: This means, multiple recipe's and reporting jobs might write into the same report-OOI at different intervals. This would mean we are making more 'snapshots' of the data at those points.