nautobot / nautobot-app-ssot

Single Source of Truth for Nautobot
https://docs.nautobot.com/projects/ssot/en/latest/
Other
37 stars 38 forks source link

Reject and log records #97

Open lampwins opened 1 year ago

lampwins commented 1 year ago

Environment

Proposed Functionality

As a data engineer, I want a strait forward way to explicitly reject records based on defined validation logic so that I can use a defined screening layer that is capable of aggregating rejected records per sync execution.

Use Case

In complex ETL use cases, there are many data validation rules which need to be embedded into the data pipeline. SSOT does not currently provide a very strait-forward way to do this. Especially considering the need to easily view all rejected records with their association reasons. The best we can do today is custom job logging.

Another similar case is the need to properly log transformation logic and actions.

Likely related to #96

Kircheneer commented 1 year ago

In practice, this might be another model or a field on an existing SSoT specific model that holds this information and could then be displayed in a table. The question to solve here is how this differs from the current logging implementation. Could this be solved by a DataSource.log_rejection? This might also come into play here: https://github.com/nautobot/nautobot/issues/2331

netopsengineer commented 1 year ago

Large enterprise user here, and working with multiple SSoT's where this feature would be very helpful, both the predefined way of rejecting devices, and the visibility of the data, and reason for rejection.

We have 50,000 plus devices and growing over several SoRs, and SSOT's and realistically there is going to be some inconsistency in the data at this scale. So a standard integration pattern of handling those situations consistently across each plugin would help with future tech debt.

There are in some cases an owner per SoR, where you aren't a SME on their data source as the plug-in dev, but we also can't quickly tell the SME what to fix if we reject their devices during create/update, which could be potentially thousands of devices and that can get overlooked when you have 50k+, and create a blind spot for them.

This leads to creating custom edge case conditionals just to try to get through an SSOT run and onboard everything, so I see @lampwins has likely hit this same spot.

lampwins commented 1 year ago

@Kircheneer described the need well. Today our only option is to hard code such rules into the adapter/model and log messages out of the execution at that layer. Two main problems arise at scale from this:

  1. The rule logic is hidden from stakeholders that own the upstream data and are responsible for fixing data quality issues. They need to understand what logic is in place.
  2. Aggregating the failure reasons for a record is not strait-forward. Diffsync provides failure and error sync states but it is not obvious where to and how to integrate these in the flow.

The main argument I make is that 2 should be a consequence of properly implementing 1. With some of the recent work in DVE it is worth investigating how we could integrate it as with #96.