slidoapp / dbt-coverage

One-stop-shop for docs and test coverage of dbt projects.
MIT License
191 stars 26 forks source link

Bug when table.original_file_path is None #52

Closed moreaupascal56 closed 10 months ago

moreaupascal56 commented 1 year ago

Hello just a small bug found :)

in init.py: Table.original_file_path can be None but line 129 the if clause:

 if table.original_file_path.startswith(path):

fails if table.original_file_path is None.

Error log:

  File "/Users/user/Library/Caches/pypoetry/virtualenvs/venv/lib/python3.8/site-packages/dbt_coverage/__init__.py", line 132, in filter_catalog
    if table.original_file_path.startswith(path):
AttributeError: 'NoneType' object has no attribute 'startswith'
mrshu commented 1 year ago

Thanks for reporting this @moreaupascal56!

I guess this shows that there are cases in which manifest.json doesn't contain original_file_path for some of its attributes. That's not something dbt-coverage expects so fixing it would be very helpful.

Just out of curiosity, would you mind sharing the log of what dbt-coverage produced? I believe we should see quite a few original_file_path value not found in manifest for messages in it:

https://github.com/slidoapp/dbt-coverage/blob/7efb5d9e5526c2e334ba6f0438ba1ab36326d8e2/dbt_coverage/__init__.py#L94-L100

Thanks!

mrshu commented 1 year ago

@moreaupascal56 a friendly ping :)

moreaupascal56 commented 1 year ago

Hey @mrshu thanks for the ping it was out of my mind :)

Here is what I got running dbt-coverage --verbose compute test --cov-report coverage-test.json --model-path-filter PATH:

INFO:root:Loading catalog and manifest files from project dir: .
INFO:root:Successfully loaded XXX tables from catalog
INFO:root:original_file_path value not found in manifest for source.XXXXX
INFO:root:original_file_path value not found in manifest for model.XXXXX
Traceback (most recent call last):
  File "/Users/XXX/.pyenv/versions/3.8.13/bin/dbt-coverage", line 8, in <module>
    sys.exit(app())
  File "/Users/XXX/.pyenv/versions/3.8.13/lib/python3.8/site-packages/typer/main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "/Users/XXX/.pyenv/versions/3.8.13/lib/python3.8/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/Users/XXX/.pyenv/versions/3.8.13/lib/python3.8/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/Users/XXX/.pyenv/versions/3.8.13/lib/python3.8/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/XXX/.pyenv/versions/3.8.13/lib/python3.8/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/XXX/.pyenv/versions/3.8.13/lib/python3.8/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/Users/XXX/.pyenv/versions/3.8.13/lib/python3.8/site-packages/typer/main.py", line 532, in wrapper
    return callback(**use_params)  # type: ignore
  File "/Users/XXX/.pyenv/versions/3.8.13/lib/python3.8/site-packages/dbt_coverage/__init__.py", line 949, in compute
    return do_compute(
  File "/Users/XXX/.pyenv/versions/3.8.13/lib/python3.8/site-packages/dbt_coverage/__init__.py", line 883, in do_compute
    catalog = catalog.filter_catalog(model_path_filter)
  File "/Users/XXX/.pyenv/versions/3.8.13/lib/python3.8/site-packages/dbt_coverage/__init__.py", line 129, in filter_catalog
    if table.original_file_path.startswith(path):
AttributeError: 'NoneType' object has no attribute 'startswith'
mrshu commented 1 year ago

Thanks @moreaupascal56, that's very interesting. Can you please share what version of dbt are you using?

moreaupascal56 commented 1 year ago

hey, 1.0.1 kinda old 🤔

mrshu commented 1 year ago

@moreaupascal56 interesting. Would you mind trying to generate the manifest with a bit newer version (like 1.3)? I really believe the original_file_path should exist in the manifest.

wbaker23 commented 1 year ago

Using dbt version 1.3.2 and I am running into this issue. It looks to be happening only for dbt sources.

devdazed commented 1 year ago

I am also running into this issue:

WARNING:root:original_file_path value not found in manifest for test.my_warehouse.not_null_dim_channel_sku_nk.5bc6f987a4
Traceback (most recent call last):
  File "/usr/local/bin/dbt-coverage", line 8, in <module>
    sys.exit(app())
  File "/usr/local/lib/python3.10/site-packages/typer/main.py", line 214, in __call__
    return get_command(self)(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1657, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.10/site-packages/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/usr/local/lib/python3.10/site-packages/typer/main.py", line 532, in wrapper
    return callback(**use_params)  # type: ignore
  File "/usr/local/lib/python3.10/site-packages/dbt_coverage/__init__.py", line 884, in compute
    return do_compute(
  File "/usr/local/lib/python3.10/site-packages/dbt_coverage/__init__.py", line 816, in do_compute
    catalog = load_files(project_dir, run_artifacts_dir)
  File "/usr/local/lib/python3.10/site-packages/dbt_coverage/__init__.py", line 719, in load_files
    catalog = load_catalog(project_dir, run_artifacts_dir, manifest)
  File "/usr/local/lib/python3.10/site-packages/dbt_coverage/__init__.py", line 681, in load_catalog
    catalog = Catalog.from_nodes(catalog_nodes.values(), manifest)
  File "/usr/local/lib/python3.10/site-packages/dbt_coverage/__init__.py", line 122, in from_nodes
    tables = [Table.from_node(table, manifest) for table in nodes]
  File "/usr/local/lib/python3.10/site-packages/dbt_coverage/__init__.py", line 122, in <listcomp>
    tables = [Table.from_node(table, manifest) for table in nodes]
  File "/usr/local/lib/python3.10/site-packages/dbt_coverage/__init__.py", line 79, in from_node
    manifest_table["name"].lower(),
TypeError: 'NoneType' object is not subscriptable

I tried re-generating the target directory and it just changed the one that failed:

WARNING:root:original_file_path value not found in manifest for test.my_warehouse.unique_dim_date_nk.2e0b9b40a1

these are all the common tests that are included in dbt. Maybe it's related to the string of random characters at the end?

BI-MarcB commented 1 year ago

Good morning, I am facing the same issue: WARNING:root:original_file_path value not found in manifest for < test >

results in TypeError: 'NoneType' object is not subscriptable

I tried this on dbt versions 1.4.6 and 1.5.2. My test case:

dbt clean & dbt docs generate & dbt-coverage --verbose compute test --model-path-filter models/mart/shared/mrt_rebase 07:02:44 Running with dbt=1.4.6 07:02:44 Checking target/* 07:02:45 Cleaned target/* 07:02:45 Checking logs/* 07:02:45 Cleaned logs/* 07:02:45 Finished cleaning all paths. 07:02:49 Running with dbt=1.4.6 07:02:49 Unable to do partial parsing because saved manifest not found. Starting full parse. 07:03:01 Found 362 models, 1242 tests, 0 snapshots, 0 analyses, 549 macros, 1 operation, 3 seed files, 187 sources, 0 exposures, 0 metrics 07:03:01 07:03:05 Concurrency: 128 threads (target='local_dev') 07:03:05 07:03:35 Done. 07:03:35 Building catalog 07:04:38 Catalog written to C:\Users\<username>\work\eds\target\catalog.json INFO:root:Loading catalog and manifest files from project dir: . WARNING:root:original_file_path value not found in manifest for test.enterprise_data_store.assert_cr_table_rowcount_matches_meta_data_cnf_cr_bace_dest_classification_ce_DELTA.a19767fb83 Traceback (most recent call last): File "c:\users\<username>\appdata\local\programs\python\python38\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "c:\users\<username>\appdata\local\programs\python\python38\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\<username>\.virtualenvs\eds-Esv60rg9\Scripts\dbt-coverage.exe\__main__.py", line 7, in <module> File "c:\users\<username>\.virtualenvs\eds-esv60rg9\lib\site-packages\typer\main.py", line 214, in __call__ return get_command(self)(*args, **kwargs) File "c:\users\<username>\.virtualenvs\eds-esv60rg9\lib\site-packages\click\core.py", line 1130, in __call__ return self.main(*args, **kwargs) File "c:\users\<username>\.virtualenvs\eds-esv60rg9\lib\site-packages\click\core.py", line 1055, in main rv = self.invoke(ctx) File "c:\users\<username>\.virtualenvs\eds-esv60rg9\lib\site-packages\click\core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "c:\users\<username>\.virtualenvs\eds-esv60rg9\lib\site-packages\click\core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "c:\users\<username>\.virtualenvs\eds-esv60rg9\lib\site-packages\click\core.py", line 760, in invoke return __callback(*args, **kwargs) File "c:\users\<username>\.virtualenvs\eds-esv60rg9\lib\site-packages\typer\main.py", line 532, in wrapper return callback(**use_params) # type: ignore File "c:\users\<username>\.virtualenvs\eds-esv60rg9\lib\site-packages\dbt_coverage\__init__.py", line 885, in compute return do_compute( File "c:\users\<username>\.virtualenvs\eds-esv60rg9\lib\site-packages\dbt_coverage\__init__.py", line 817, in do_compute catalog = load_files(project_dir, run_artifacts_dir) File "c:\users\<username>\.virtualenvs\eds-esv60rg9\lib\site-packages\dbt_coverage\__init__.py", line 720, in load_files catalog = load_catalog(project_dir, run_artifacts_dir, manifest) File "c:\users\<username>\.virtualenvs\eds-esv60rg9\lib\site-packages\dbt_coverage\__init__.py", line 682, in load_catalog catalog = Catalog.from_nodes(catalog_nodes.values(), manifest) File "c:\users\<username>\.virtualenvs\eds-esv60rg9\lib\site-packages\dbt_coverage\__init__.py", line 123, in from_nodes tables = [Table.from_node(table, manifest) for table in nodes] File "c:\users\<username>\.virtualenvs\eds-esv60rg9\lib\site-packages\dbt_coverage\__init__.py", line 123, in <listcomp> tables = [Table.from_node(table, manifest) for table in nodes] File "c:\users\<username>\.virtualenvs\eds-esv60rg9\lib\site-packages\dbt_coverage\__init__.py", line 80, in from_node manifest_table["name"].lower(), TypeError: 'NoneType' object is not subscriptable

The weird thing is that I can see "original_file_path" in the manifest.json under nodes: grafik

fabientra commented 1 year ago

Hello,

I encountered the same issue mentioned in the two messages above.

When we want to load the Catalog, we first fetch all the nodes regardless of their type, including the tests. However, when we create the catalog with the from_nodes method, get_table only allows candidates from source, models, seeds and snapshots, but not from tests. Then the get_table returns None so manifest_table is None and manifest_table["name"].lower() triggers the error.

There does not seem to be a type in the nodes of the catalog.json but maybe filtering out the unique_id starting with test. can be enough? and/or not trying to return a Table if the manifest_table is None

I hope this is helpful and I am looking forward to be able to try it :)

BI-MarcB commented 1 year ago

I have found a solution to get dbt-coverage to work again locally. It is very not elegant, so I am open for any refactoring ideas.

mrshu commented 1 year ago

It seems this might also help with https://github.com/slidoapp/dbt-coverage/issues/62 as well

BI-MarcB commented 10 months ago

@sweco Could you have a look at the PR for this bug, please?

sweco commented 10 months ago

Hey @BI-MarcB, sorry for a very late reply and thanks for pinging me.

I've checked the PR and tried to understand the issue but I can't seem to reproduce it. I cannot even get tests to appear in the catalog.json. Can you please provide more details on how the problematic test is written in dbt source files or anything else that would help to reproduce the issue?

sweco commented 10 months ago

Alright, I managed to reproduce this by setting store_failures on the project level in dbt_project.yml (docs, also described in #62). @BI-MarcB, @fabientra, are you using this option by any chance?

sweco commented 10 months ago

I just released the version 0.3.5 that should fix this, can you please test?