Closed theodotdot closed 5 months ago
I knew I forgot something from yesterday's issue thread 😅 Thanks for creating this issue, @theodotdot. As you mentioned, I did resolve a previous defect w/ cross-project seed references in 0.5.1, so there is a precedent here for edge cases with seeds.
I'm keen to see your toy example if you're able to put one together. Otherwise, I may benefit from seeing the relevant portions of your projects' manifest files.
No worries, I will try to take some time to tinker with that and see if I can replicate in an env that you can also use on your machine and I will update here! I will update you either way later this week
Good news! (or bad depending on how you see it)
@nicholasyager I was able to replicate it using the test projects in the dbt-loom repo.
To Reproduce Steps to reproduce the behavior: (running dbt-core 1.7.14)
revenue
projectrevenue
builds without errorscustomer_success
gives the following error12:17:16 Running with dbt=1.7.14
12:17:16 dbt-loom: Patching ref protection methods to support dbt-loom dependencies.
12:17:16 dbt-loom: Loading manifest for `revenue` from `file`
12:17:16 Registered adapter: duckdb=1.7.4
12:17:16 dbt-loom: Injecting nodes
12:17:16 [WARNING]: Model orders.v1 has passed its deprecation date of 2024-01-01T00:00:00+01:00. This model should be disabled or removed.
12:17:16 Encountered an error:
Compilation Error
'model.revenue.stg_accounts' depends on 'seed.revenue.accounts' which is not in the graph!
I tested additional scenarios:
So it seems the issue comes from having the staging model reference the seed, regardless of the access defined. Now this is where my knowledge ends, I tried to have a look at how the node injection works but I have to say I don't really understand and my python is super rusty.
Let me know if you'd like me to test other things, I hope that helps!
@theodotdot Thanks for the information! I've been able to replicate the defect, so I should be able to track down the issue shortly. My current hunch is that it is another instance of unique_ids (the IDs for nodes internal to dbt-core) not matching correctly when dbt-core maps between Nodes
native to a project and ModelArgNodes
injected into the project.
I figured it out! dbt-core's implementation of ModelArgNodes
hard-codes model.
as the prefix for all nodes injected into a project, which creates a mismatch in unique_ids
.
flowchart LR
subgraph project_a
seed.project_a.accounts --> model.project_a.stg_accounts
end
model.project_a.stg_accounts2
subgraph project_b
subgraph injected_project_a
seed.project_a.accounts2[model.project_a.accounts]
model.project_a.stg_accounts2[model.project_a.stg_accounts]
end
model.project_a.stg_accounts2 --> accounts
end
seed.project_a.accounts -.-> seed.project_a.accounts2
model.project_a.stg_accounts -.-> model.project_a.stg_accounts2
This isn't a problem when a model in a downstream project references a seed directly in an upstream project, since the ref
macro on-to-fly resolves the seed to the injected model.
unique_id.
flowchart LR
subgraph project_a
seed.project_a.accounts
end
model.project_a.stg_accounts2
subgraph project_b
subgraph injected_project_a
seed.project_a.accounts2[model.project_a.accounts]
end
seed.project_a.accounts2 --> model.project_a.stg_accounts2[model.project_a.stg_accounts]
model.project_a.stg_accounts2 --> accounts
end
seed.project_a.accounts -.-> seed.project_a.accounts2
I see four options forward.
depends_on
list. This will NOT break the DAG, since the downstream project does not need to know about the upstream project's implementation, but viewing the docs site would fail to render the dependency.resource_type
field in the ModelArgsNode
class, allowing for true replication of complete upstream DAGs into a downstream project.resource_type
field in the ModelArgsNode
class, allowing for true replication of complete upstream DAGs into a downstream project.At this point in time, I'm leaning towards option one -- strategically, downstream projects really shouldn't know or care about nodes upstream of the nodes being used. More so if the nodes are protected or private to a given project. This throws a wrench in for people using dbt docs for cross-project documentation, which isn't great. On the other hand, it keeps dbt-loom compatible with dbt Cloud's cross-project lineage functionality, which is ideal.
My fallback approach would be option two -- We're already benefiting from some unique_id rewriting (as seen in the example), so perhaps this is an opportunity to have a better experience than that offered by dbt Cloud 🤷🏻
@theodotdot do you have any preferences for the path forward?
Wow that is great, thanks for finding the actual issue here!
I agree with your point regarding option 1: downstream projects shouldn't need to know and while it would be great to have the docs, I can say that it wouldn't be a deal breaker.
From a usage/strategy point of view, I think that keeping dbt-loom and the dbt cloud offering compatible is good for the plugin as it enables users to switch easily but I also feel like there is usually a good reason to not use dbt cloud so I can mostly see the switch being from cloud to core (and loom) vs the opposite.
I wouldn't mind dbt-loom being even better than the dbt cloud alternative considering I know that my team and I won't be moving to Cloud (probably) ever.
Option 3 would be nice as it's probably a net positive to dbt to have that addition distinction and would be the "cleaner" and more comprehensive way of going about solving the issue.
--> I honestly feel like option 1 is the simplest to implement and for my use cases slightly broken docs doesn't matter but if you're a perfectionist I can also understand the other options, you know better and this is your project, I am just happy I will get to use dbt-loom and multi-project :smile:
Thanks for the insight! I'm going to go ahead with Option 1, removing non-model dependencies from each Node
's depends_on
list. This will resolve the immediate issue.
Thanks again for the fix on this, I will test this on my production project once 0.5.3 is out!
EDIT: Tested it and it works flawlessly now, thanks!!!
Describe the bug Running any dbt command that requires compiling from
Project B
using the manifest fromProject A
throws the following error:This error refers to a seed and model that are not upstream of the model I was trying to run. In
Project A
, there are no errors when running or compiling.I tried running with
--no-partial-parse
as well as runningdbt clean
before the commands to no avail. Deleting the seed and related model only gave the same error with another seed and model pair.I have looked at the
manifest.json
fromProject A
and I can see the seed node, however, when looking at the manifest generated from the run inProject B
, I can only see the seed node as a dependency from the staging model but doesn't seem to be injected as a node itself.To Reproduce Steps to reproduce the behavior:
Project A
has protected seeds that are upstream of some other protected modelsProject A
has a public model that is downstream of one or more seeds.Project A
compiles with no errorsProject B
references a public model from the upstream modeldbt run
)Expected behavior
Project B
should also compile without any errors.Additional context Thanks to smilingthax (don't want to tag to spam emails) for pointing out that this error with seeds that I mentioned in another issue was talked already in another issue (that I missed when researching).
I haven't had time to rebuild this with a toy project at the moment, if that is needed I will take some time later in the week as I have been pushing at work to give dbt-loom work so I am invested hah. Thanks for all the efforts and for creating and maintaining this amazing project!!