Omit models/configurations from upstream manifest

d-roman-halliday commented 1 week ago

I have a parent project A, which is inherited to child project B using dbt-loom, most aspects work as expected (awesome work team!).

Both projects use, and have configured dbt-project-evaluator. See https://github.com/dbt-labs/dbt-project-evaluator Project A has for a model and test enabled fct_source_fanout, project B has the model fct_source_fanout (and their fore the test) disabled.

When the CI process executes for Project B, there is an error that the test failed because it doesn't have access to the model (referencing the location) built for project A (which is disabled in project B). Where configuration exists in project B, that takes priority.

The work around is adding configuration lines to the dbt_project.yml explicitly disabling the tests for which the models are already disabled.

Describe the solution you'd like

Ideally, we should be able to set for models & configurations in the parent project dbt_project.yml an option telling dbt-loom ignore all of this, pretend it's not there.

If that's not possible, an optional configuration in the dbt_loom.config.yml so that the identify_node_subgraph() or somewhere similar can drop nodes/configuration information.

This way we can prevent cross project interference where there may be common packages, or other aspects which can cause a clash.

Describe alternatives you've considered

I've posted a work around, plus two suggestions for a solution.

Another option could be blocking content of packages used by the parent project... But that could block genuine requirements from showing up in the full project dag.

Additional context

Relevant configuration from project A:

models:
  dbt_project_evaluator:
    marts:
      dag:
        fct_source_fanout:
          +enabled: true

Relevant configuration from project B:

models:
  dbt_project_evaluator:
    marts:
      dag:
        fct_source_fanout:
          +enabled: false

Work around configuration change for project B:

tests:
  dbt_project_evaluator:
    marts:
      dag:
        is_empty_fct_duplicate_sources_: # Disabling due to loom/Project Evaluator inheriting from core
          +enabled: false

nicholasyager commented 1 day ago

Thanks for creating this issue @d-roman-halliday! My gut reaction is "Huh. That shouldn't be happening", but that's likely an artifact of some assumptions about the access levels for the models generated by dbt_project_evaluator.

I'll spend some time this week attempting to replicate the issue.

nicholasyager commented 1 day ago

Well, that was easy to replicate 😂 My intuition was partially right. These dbt_project_evaluator models are marked with a protected access level which should, at face value, prevent cross-project injection. However, since the downstream project's instance of dbt_project_evaluator has the same name (because it's the same package), the protected check passes. This seems like a small flaw in the current access control model w/in dbt-core 🤣

In any case, we can definitely augment dbt-loom to handle these situations more appropriately.

From your post above, you expressed an interest in being able to configure Project A such that Project B will be unable to import package models. Unfortunately, this is more-or-less the reverse of dbt-core's cross-project permissioning model. There's an inherrent assumption that anything marked public in a package is fair game, and the security defect above means shared packages treat protected as equally useable.

With this in mind, the simplest approach might be to extend our ManifestReference definition to allow downstream packages to configure an exclusion list of packages to omit. This seems to be in line with your second suggestion.

I'm thinking of something like:

manifests:
  - name: revenue
    type: file
    config:
      path: ../revenue/target/manifest.json
    exclude_packages:
      - dbt_project_evaluator

This would be parsed into the ManifestReference, and used by dbtLoom to filter out resources from an excluded package.

@d-roman-halliday Would this meet your need, or is there a better, more ergonomic approach we should be considering?

Other considerations:

dbt_project_evaluator writes to your database. So, if both projects have the same database configuration (DB and schema), there will be cross-contamination in the test results between runs/projects.
The dbt_project_evalutor package checks all models from all packages by default. As a result, Project B will automatically report on Project A model issues, too. I assume you've configured this already, but for other folks reading the thread y'all may need to set exclude_packages as described here.

d-roman-halliday commented 1 day ago

Hi @nicholasyager,

Yes that approach of adding the exclusion to the manifests list should work and is in line with my second suggestion.

Thank you for the other considerations, they will be very useful to anyone else who ends up here trying to combine DBT-loom with DBT-project-evaluator.

In my case, each CI job runs on its own database so there is no clash for the outputs of each project.

nicholasyager / dbt-loom

Omit models/configurations from upstream manifest #93