sodadata / soda-sql

Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html
https://docs.soda.io/
Apache License 2.0
59 stars 16 forks source link

Source tests cannot find a model node in dbt's manifest.json #174

Closed bastienboutonnet closed 2 years ago

bastienboutonnet commented 2 years ago

Describe the bug @bjornvandijkman-ingka raised this issue on the community slack: https://soda-community.slack.com/archives/C01HYL8V64C/p1640076836205300

It looks like when ingesting if the source has a test on it, it will be part of the run result, but no node is able to be found in the manifest.json because dbt does not manage sources in the manifest.json.

Traceback

─$ soda ingest dbt --warehouse-yml-file warehouse.yml --dbt-manifest docs/manifest.json --dbt-run-results docs/run_results.json

  | 2.1.1
Traceback (most recent call last):
  File "/Users/bjornvandijkman/Documents/GitHub/customer360-foundation/dbt/.venv/bin/soda", line 8, in <module>
    sys.exit(main())
  File "/Users/bjornvandijkman/Documents/GitHub/customer360-foundation/dbt/.venv/lib/python3.9/site-packages/click/core.py", line 1128, in __call__
    return self.main(*args, **kwargs)
  File "/Users/bjornvandijkman/Documents/GitHub/customer360-foundation/dbt/.venv/lib/python3.9/site-packages/click/core.py", line 1053, in main
    rv = self.invoke(ctx)
  File "/Users/bjornvandijkman/Documents/GitHub/customer360-foundation/dbt/.venv/lib/python3.9/site-packages/click/core.py", line 1659, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/bjornvandijkman/Documents/GitHub/customer360-foundation/dbt/.venv/lib/python3.9/site-packages/click/core.py", line 1395, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/bjornvandijkman/Documents/GitHub/customer360-foundation/dbt/.venv/lib/python3.9/site-packages/click/core.py", line 754, in invoke
    return __callback(*args, **kwargs)
  File "/Users/bjornvandijkman/Documents/GitHub/customer360-foundation/dbt/.venv/lib/python3.9/site-packages/sodasql/cli/cli.py", line 525, in ingest
    _ingest(*args, **kwargs)
  File "/Users/bjornvandijkman/Documents/GitHub/customer360-foundation/dbt/.venv/lib/python3.9/site-packages/sodasql/cli/ingest.py", line 219, in ingest
    flush_test_results(
  File "/Users/bjornvandijkman/Documents/GitHub/customer360-foundation/dbt/.venv/lib/python3.9/site-packages/sodasql/cli/ingest.py", line 152, in flush_test_results
    for table, test_results in test_results_iterator:
  File "/Users/bjornvandijkman/Documents/GitHub/customer360-foundation/dbt/.venv/lib/python3.9/site-packages/sodasql/cli/ingest.py", line 120, in map_dbt_test_results_iterator
    model_and_seed_nodes[unique_id].alias,
KeyError: 'source.dbt_customer_360.dbt_customer_360_prod.orders'

To Reproduce Add a test to a source node, this should lead to a run result appearing for that test and then later on fail when we're trying to find that node in the manifest.json

bastienboutonnet commented 2 years ago

How should we fix it?

As @JCZuurmond we don't seem to be capturing "source" nodes from the manifest.json which results in a key error later.

We should look into parsing those as well and all should work fine again.

We might also want to think about a different way to handle the error.

I think while the KeyError allowed us to spot the issue, it admittedly ruins the ability to push any of the found results into the cloud platform. I think we therefore may want a try except and collect the list of nodes for which an error was raised and then output this list to the user at the end of the run.

JCZuurmond commented 2 years ago

I would like to resolve the error handling in a different issue. I am not a fan of complicating the code too much with such error handling, so let's discuss what Soda's approach is for this.