nicholasyager / dbt-loom

A dbt-core plugin to weave together multi-project dbt-core deployments
The Unlicense
104 stars 19 forks source link

Problem with versioned models?! #48

Closed smilingthax closed 4 months ago

smilingthax commented 4 months ago

Describe the bug

dbt-loom/test_projects/customer_success# dbt build
09:32:20  Running with dbt=1.7.14
09:32:20  dbt-loom: Patching ref protection methods to support dbt-loom dependencies.
09:32:20  dbt-loom: Loading manifest for `revenue` from `file`
09:32:20  Registered adapter: duckdb=1.7.4
09:32:20  dbt-loom: Injecting nodes
09:32:20  [WARNING]: Model orders has passed its deprecation date of 2024-01-01T00:00:00+00:00. This model should be disabled or removed.            ## (removing deprecation_date does not change anything)
09:32:20  Encountered an error:
Compilation Error
  'model.revenue.not_null_orders_v1_order_id' depends on 'model.revenue.orders.v1' which is not in the graph!

To Reproduce

  1. Install/Setup dbt-core, dbt-duckdb, dbt-loom, ...
  2. git clone https://github.com/nicholasyager/dbt-loom (to retrieve test_projects/)
  3. In dbt-loom/test_projects/revenue run dbt deps, dbt build, dbt run
  4. In dbt-loom/test_projects/customer_success run dbt deps, try dbt build or dbt run
  5. See error, above.

Expected behavior

The test project from the dbt-loom repository should compile without errors.
Other projects which use versioned models also compile without errors.

Additional context

This first happened in my own project, but just using the test_projects from the dbt-loom repository exhibits the same behaviour.

AFAICT the corresponding node name/id in revenue/target/manifest.json is "model.revenue.orders.v1", whereas in customer_success/target/manifest.json the injected(?) node seems to be called "model.revenue.orders.v1.0" – but some/all(?) references to it (depends_on, ...) still use the "original" "model.revenue.orders.v1" name/id, which then cannot be found, as said in the error message (... depends on 'model.revenue.orders.v1' which is not in the graph!)...

Non-versioned models seem to be unaffected / work fine.

nicholasyager commented 4 months ago

Thank you for taking the time to put this together, @smilingthax! I've attempted to replicate this result, but I've not been able to get dbt-loom/dbt-core to yield this same compilation error.

I have a suspicion that there may be an incompatibility being exposed here between different versions of dbt-core and how they represent node unique_id values for versioned models between dbt-core versions. If you're still receiving the error, can you please follow your same steps, but run a dbt clean prior to dbt deps in both projects? I suspect that this will clear out any lingering incompatible unique_ids generated between versions.

theodotdot commented 4 months ago

I am not sure my issue is related but it seems like it is:

I am having similar errors like

Compilation Error
  'model.my_dbt_project.stg_my_seed_file' depends on 'seed.my_dbt_project.seed_my_seed_file' which is not in the graph!
(I replaced the names with generic ones, this is on a private dbt repo that I cannot share)

This error happens with seeds however. This error even happens when trying to run models unrelated to the seed file and model in the error. In the first project, there are no errors when running or compiling.

I tried running with --no-partial-parse as well as running dbt clean before the commands to no avail. Deleting the seed and related model only gave the same error with another seed and model pair. I have tried with both dbt=1.7.14/bigquery=1.7.7 and dbt=1.6.9/bigquery=1.6.9

I have looked at the manifest.json from project A and I can see the seed node, however, when looking at the manifest generated from the run in project B, I can only see the seed node as a dependency from the staging model but doesn't seem to be injected as a node itself.

smilingthax commented 4 months ago

I have a suspicion that there may be an incompatibility being exposed here between different versions of dbt-core and how they represent node unique_id values for versioned models between dbt-core versions. If you're still receiving the error, can you please follow your same steps, but run a dbt clean prior to dbt deps in both projects? I suspect that this will clear out any lingering incompatible unique_ids generated between versions.

It is not a transient error, I can reproduce it with a "clean install":

Dockerfile.bug:

FROM python:3.12-bookworm AS base

RUN apt-get update \
 && apt-get dist-upgrade -y \
 && apt-get install -y --no-install-recommends \
    git \
    less vim

ENV PYTHONIOENCODING=utf-8
ENV LANG=C.UTF-8

RUN python -m pip install --no-cache-dir "dbt-core"
RUN python -m pip install --no-cache-dir "dbt-duckdb"
RUN python -m pip install --no-cache-dir "dbt-loom"

Then:

$ docker build - < Dockerfile.bug
[...]
Successfully built 7fa7942df6c1

$ docker run --rm -ti 7fa7942df6c1 bash

root@bc9d87680530:/# cd /tmp

root@bc9d87680530:/tmp# git clone https://github.com/nicholasyager/dbt-loom
[...]

root@bc9d87680530:/tmp# cd dbt-loom/test-projects/revenue

root@bc9d87680530:/tmp/dbt-loom/test_projects/revenue# dbt clean
15:17:47  Running with dbt=1.7.14
15:17:48  dbt-loom: Patching ref protection methods to support dbt-loom dependencies.
15:17:48  Checking /tmp/dbt-loom/test_projects/revenue/target/*
15:17:48  Cleaned /tmp/dbt-loom/test_projects/revenue/target/*
15:17:48  Checking /tmp/dbt-loom/test_projects/revenue/dbt_packages/*
15:17:48  Cleaned /tmp/dbt-loom/test_projects/revenue/dbt_packages/*
15:17:48  Finished cleaning all paths.

root@bc9d87680530:/tmp/dbt-loom/test_projects/revenue# dbt deps
15:17:52  Running with dbt=1.7.14
15:17:52  dbt-loom: Patching ref protection methods to support dbt-loom dependencies.
15:17:52  Installing dbt-labs/dbt_utils
15:17:53  Installed from version 1.0.0
15:17:53  Updated version available: 1.1.1
15:17:53
15:17:53  Updates available for packages: ['dbt-labs/dbt_utils']
Update your versions in packages.yml, then run dbt deps

root@bc9d87680530:/tmp/dbt-loom/test_projects/revenue# dbt build
15:17:56  Running with dbt=1.7.14
15:17:56  dbt-loom: Patching ref protection methods to support dbt-loom dependencies.
15:17:56  Registered adapter: duckdb=1.7.4
15:17:56  Unable to do partial parsing because saved manifest not found. Starting full parse.
15:17:57  dbt-loom: Injecting nodes
15:17:57  [WARNING]: Model orders.v1 has passed its deprecation date of 2024-01-01T00:00:00+00:00. This model should be disabled or removed.
15:17:57  Found 7 models, 1 seed, 18 tests, 5 sources, 0 exposures, 0 metrics, 507 macros, 0 groups, 0 semantic models
[...]
15:17:58  Finished running 5 view models, 18 tests, 1 seed, 2 incremental models in 0 hours 0 minutes and 1.35 seconds (1.35s).
15:17:58
15:17:58  Completed successfully
15:17:58
15:17:58  Done. PASS=26 WARN=0 ERROR=0 SKIP=0 TOTAL=26

root@bc9d87680530:/tmp/dbt-loom/test_projects/revenue# dbt run
15:18:02  Running with dbt=1.7.14
15:18:02  dbt-loom: Patching ref protection methods to support dbt-loom dependencies.
15:18:02  Registered adapter: duckdb=1.7.4
15:18:02  dbt-loom: Injecting nodes
15:18:02  [WARNING]: Model orders.v1 has passed its deprecation date of 2024-01-01T00:00:00+00:00. This model should be disabled or removed.
15:18:02  Found 7 models, 1 seed, 18 tests, 5 sources, 0 exposures, 0 metrics, 507 macros, 0 groups, 0 semantic models
15:18:02
15:18:03  Concurrency: 4 threads (target='dev')
15:18:03
15:18:03  1 of 7 START sql view model main.stg_locations ................................. [RUN]
15:18:03  2 of 7 START sql view model main.stg_order_items ............................... [RUN]
15:18:03  3 of 7 START sql view model main.stg_orders .................................... [RUN]
15:18:03  4 of 7 START sql view model main.stg_products .................................. [RUN]
15:18:03  1 of 7 OK created sql view model main.stg_locations ............................ [OK in 0.09s]
15:18:03  5 of 7 START sql view model main.stg_supplies .................................. [RUN]
15:18:03  4 of 7 OK created sql view model main.stg_products ............................. [OK in 0.11s]
15:18:03  2 of 7 OK created sql view model main.stg_order_items .......................... [OK in 0.12s]
15:18:03  3 of 7 OK created sql view model main.stg_orders ............................... [OK in 0.14s]
15:18:03  5 of 7 OK created sql view model main.stg_supplies ............................. [OK in 0.08s]
15:18:03  6 of 7 START sql incremental model main.orders_v1 .............................. [RUN]
15:18:03  7 of 7 START sql incremental model main.orders_v2 .............................. [RUN]
15:18:03  7 of 7 OK created sql incremental model main.orders_v2 ......................... [OK in 0.27s]
15:18:03  6 of 7 OK created sql incremental model main.orders_v1 ......................... [OK in 0.56s]
15:18:03
15:18:03  Finished running 5 view models, 2 incremental models in 0 hours 0 minutes and 0.81 seconds (0.81s).
15:18:03
15:18:03  Completed successfully
15:18:03
15:18:03  Done. PASS=7 WARN=0 ERROR=0 SKIP=0 TOTAL=7

root@bc9d87680530:/tmp/dbt-loom/test_projects/revenue# cd ../customer_success/

root@bc9d87680530:/tmp/dbt-loom/test_projects/customer_success# dbt clean
15:18:16  Running with dbt=1.7.14
15:18:16  dbt-loom: Patching ref protection methods to support dbt-loom dependencies.
15:18:16  dbt-loom: Loading manifest for `revenue` from `file`
15:18:16  Checking /tmp/dbt-loom/test_projects/customer_success/dbt_packages/*
15:18:16  Cleaned /tmp/dbt-loom/test_projects/customer_success/dbt_packages/*
15:18:16  Checking /tmp/dbt-loom/test_projects/customer_success/target/*
15:18:16  Cleaned /tmp/dbt-loom/test_projects/customer_success/target/*
15:18:16  Finished cleaning all paths.

root@bc9d87680530:/tmp/dbt-loom/test_projects/customer_success# dbt deps
15:18:22  Running with dbt=1.7.14
15:18:22  dbt-loom: Patching ref protection methods to support dbt-loom dependencies.
15:18:22  dbt-loom: Loading manifest for `revenue` from `file`
15:18:22  Installing dbt-labs/dbt_utils
15:18:22  Installed from version 1.0.0
15:18:22  Updated version available: 1.1.1
15:18:22
15:18:22  Updates available for packages: ['dbt-labs/dbt_utils']
Update your versions in packages.yml, then run dbt deps

root@bc9d87680530:/tmp/dbt-loom/test_projects/customer_success# dbt build
15:18:27  Running with dbt=1.7.14
15:18:27  dbt-loom: Patching ref protection methods to support dbt-loom dependencies.
15:18:27  dbt-loom: Loading manifest for `revenue` from `file`
15:18:27  Registered adapter: duckdb=1.7.4
15:18:27  Unable to do partial parsing because saved manifest not found. Starting full parse.
15:18:28  [WARNING]: Did not find matching node for patch with name 'orders' in the 'models' section of file 'models/marts/__models.yml'
15:18:28  dbt-loom: Injecting nodes
15:18:28  [WARNING]: Model orders has passed its deprecation date of 2024-01-01T00:00:00+00:00. This model should be disabled or removed.
15:18:28  Encountered an error:
Compilation Error
  'model.revenue.not_null_orders_v1_order_id' depends on 'model.revenue.orders.v1' which is not in the graph!

root@bc9d87680530:/tmp/dbt-loom/test_projects/customer_success#

This could probably be minified even more... the Dockerfile is simply based on https://github.com/dbt-labs/dbt-core/blob/main/docker/Dockerfile; I previously pip-installed the more specific "git+https://github.com/dbt-labs/dbt-core@v1.7.14#egg=dbt-core&subdirectory=core" (but not for dbt-duckdb / dbt-loom), with no difference.

I also believe in some incompatibility, but I have no clue what to try next....

@theodotdot : A - somewhat similar - problem with seeds has already been fixed in 0.5.1 (are you using the newest version of dbt-loom?): https://github.com/nicholasyager/dbt-loom/pull/47 , but it is AFAICT unrelated to this issue.

nicholasyager commented 4 months ago

@smilingthax Thanks for the Dockerfile! This should make it much easier to replicate the issue. Also, kudos for confirm that this isn't transient. I'll dig into replicating the issue this afternoon 👍🏻 In the meantime, please let me know if you find any more leads.

nicholasyager commented 4 months ago

@smilingthax Quick update on this: I've determined that the difference seems to be due to how ModelNodeArgs in dbt-core handles version values, where a version can be either a str or a float. I think what's happening is:

The resolution, then is two fold:

  1. Update ManifestNode to perform a type on the model's version. This will prevent incorrect deserialization from occuring.
  2. Update identify_node_subgraph to use dbt-core's NodeType enums to prevent the injection of non-node resources.
nicholasyager commented 4 months ago

@smilingthax I have a PR (#50) that I believe resolves this defect. I'll leave this up for a day or so if you want to give that branch a test with your dbt implementation. Let me know if you have any feedback!

smilingthax commented 4 months ago

The string-conversation of #50 in dbt_loom/__init__.py fixes the problem in my own project; the PR branch also installs + works fine; I did not test the other changes.