z3z1ma / dbt-osmosis

Provides automated YAML management, a dbt server, streamlit workbench, and git-integrated dbt model output diff tools
https://z3z1ma.github.io/dbt-osmosis/
Apache License 2.0
422 stars 46 forks source link

Fix race condition in draft phase #131

Closed sp-tkerlavage closed 3 months ago

sp-tkerlavage commented 4 months ago

I have a somewhat large project that I'm trying to refactor. When I was running dbt-osmosis, I was getting this error randomly:

ERROR    Failed to draft project structure update plan for model.my_project.some_model: 'models'         osmosis.py:610

The error message isn't very helpful. This same error would be reported for many different models, while other models would succeed during this step.

Running dbt-osmosis for a single model at a time produced no such error -- even on models that were erroring when dbt-osmosis was invoked across many models.

Looking at the traceback a better picture

ERROR    Failed to draft project structure update plan for model.my_project.some_model: 'models'         osmosis.py:610
         ╭───────────────────────────────────────────── Traceback (most recent call last) ─────────────────────────────────────────────╮               
         │ /Users/me/Projects/dbt-osmosis/src/dbt_osmosis/core/osmosis.py:559 in _draft                                   │               
         │                                                                                                                             │               
         │    556 │   │   │   │   │   # Augment Documented Model                                                                       │               
         │    557 │   │   │   │   │   augmented_model = self.augment_existing_model(documented_model,                                  │               
         │        node)                                                                                                                │               
         │    558 │   │   │   │   │   with self.mutex:                                                                                 │               
         │ ❱  559 │   │   │   │   │   │                                                                                                │               
         │        blueprint[schema_file.target].output["models"].append(augmented_model)                                               │               
         │    560 │   │   │   │   │   │   # Target to supersede current                                                                │               
         │    561 │   │   │   │   │   │   blueprint[schema_file.target].supersede.setdefault(                                          │               
         │    562 │   │   │   │   │   │   │   schema_file.current, []                                                                  │               
         ╰─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯               
         KeyError: 'models'

I think this error is the result of the cleanup that happens in the _draft method. Specifically this code

            for k in blueprint:
                # Remove if sources or models are empty
                if blueprint[k].output.get("sources", None) == []:
                    del blueprint[k].output["sources"]
                if blueprint[k].output.get("models", None) == []:
                    del blueprint[k].output["models"]

This seems like its some sort of race condition where if the cleanup happens first, and another thread tries to append, it cant because the key doesnt exist.

Moving this cleanup out of the _draft method into its own method and calling it after draft_project_structure_update_plan seems to fix the issue.