microsoft / dbt-fabric

MIT License
79 stars 28 forks source link

Schemas not being created in other warehouses, only the one set specified in the profiles.yml. #166

Closed Analyticminder closed 6 months ago

Analyticminder commented 6 months ago

Hello, it appears that schemas are not being created when dbt allows the deployment to different warehouses. For example, my profiles.yml is set to schema dbo and the database is WH_Stage_Bronze

dbt_fabric_testing:
  target: dev
  outputs:
    dev:
      type: fabric
      driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system)
      server: '<your_server>'
      port: 1433
      database: WH_Stage_Bronze
      schema: dbo
      authentication: ActiveDirectoryInteractive
      user: '<your_username>'
      schema_authorization: dbo

I have an override for the generate_database_name.sql that works successfully in deploying objects to other warehouses in the same workspace (provided the schemas are setup beforehand hence the issue I'm having)

{% macro generate_database_name(custom_database_name, node) %}
    {% set default_database = target.database %}

    {% if (target.name == 'dev') and node.name[:3] == 'STG' %}
        {{"WH_Stage_Bronze"}}
    {% elif (target.name == 'dev') and node.name[:3] == 'CFM' %}
        {{"WH_Conformed_Silver"}}
    {% endif %}
{% endmacro %}

I also have an override for the schema that is as follows:

{% macro generate_schema_name(custom_schema_name, node) %}
    {% if not node.unique_id.startswith('model.dbt_artifacts') %}
        {% set default_schema = target.schema %}
        {% set nodelist = node.name.split('_') %}
        {% set custom_schema_name = nodelist[1] %}
        {{ custom_schema_name | trim }}
    {% else %}
        {{ default_schema }}
    {% endif %}
{% endmacro %}

It basically parses the file or model name by double underscores. It takes the middle section of the name and uses that as the schema. When I "dbt run" the project, I get the error:

"('42000', '[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]
The specified schema name "common" either does not exist or you do not have permission to use it. (2760) (SQLMoreResults)')"

What should happen is the files that start with "CFM__" should go to the warehouse WH_Conformed_Silver and files that start with "STG__" go to WH_Stage_Bronze warehouse. The "CFM__common" files should go to the WH_Conformed_Silver.common Warehouse/schema and the "CFM__sales" should go to the WH_Conformed_Silver.sales Warehouse/schema. The only schemas that exist on WH_Conformed_Silver at this point is dbo. image However, if I manually create the schemas for WH_Conformed_Silver:

USE WH_Conformed_Silver;
  CREATE SCHEMA [common];
  CREATE SCHEMA [sales];

And then do a 'dbt run' then everything runs successfully as the schemas now exist: image Here is the dbt versions I'm using:

(.venv) PS C:\repos\dbt_fabric_testing> dbt --version
Core:
  - installed: 1.8.0-b2
  - latest: 1.7.13 - Ahead of latest version!
Plugins:
  - fabric: 1.8.3 - Up to date!

I am able to successfully perform schema deployment/creation to different databases in dbt-snowflake, I figured we should be able to the same for Fabric as Fabric allows for cross-database querying. For a number of reasons, it would be beneficial for dbt-fabric to create these schemas for us. Could someone please help?

prdpsvs commented 6 months ago

@dataders , Do you think I should extract database and schema information for each model using generate database_name and generate_schema_name to create schema if it does not exist and use it during table/view materialization?

I looked at snowflake adapter but could not find this. Is there a different way to achieve this using pre-built dbt implementation?

dataders commented 6 months ago

here's my understanding so far.

  1. dbt has a convention that if you run (create) a model in a schema that does yet exist. dbt will create that schema for you
  2. This functionality works even if the model is also in another database than the one specified in profiles.yml
  3. @Analyticminder, you're noticing that (1) works, but (2) isn't.

@prdpsvs I think the custom generate_database_name and generate_schema_name macro custom overrides are red herrings.

theoretical reproduction

Here's a more simple reproduction:

  1. create two "warehouses" within the same Fabric workspace: silver and gold
  2. in your profiles.yml, specify database: silver
  3. create a test model with a config that overrides database and schema

    '-- my_model.sql
    {{
      config(
        database = 'gold',
        schema = 'new_schema'
        )
    }}
    
    SELECT 1 as my_column'
  4. dbt run -s my_model will understand that the new_schema schema needs to be made, but will create it in the silver database.

next steps

@prdpsvs I think we're seeing something similar to #161. @Analyticminder can you share the relevant SQL statements from your logs/dbt.log file?

prdpsvs commented 6 months ago

However, if I manually create the schemas for WH_Conformed_Silver:

USE WH_Conformed_Silver;
  CREATE SCHEMA [common];
  CREATE SCHEMA [sales];

And then do a 'dbt run' then everything runs successfully as the schemas now exist: image Here is the dbt versions I'm using:

@dataders , It looks like 1st is not working as per above by @Analyticminder. @Analyticminder , please confirm.

Analyticminder commented 6 months ago

@dataders and @prdpsvs Thanks for responding! I appreciate both of your time!

I'm saying the functionality should work for dbt to create a schema for a model if it does not exist. However, it is not working in my example with overrides to schema and database. It appears that the use of generate_database_name/generate_schema_name may be altering the code that gets run against Fabric... See below.

Below is the full log from the original post. original_problem_dbt.log It seems to find schemas:

19:39:29.226772 [debug] [ThreadPool]: Using fabric connection "list_WH_Conformed_Silver"
19:39:29.227772 [debug] [ThreadPool]: On list_WH_Conformed_Silver: /* {"app": "dbt", "dbt_version": "1.7.14", "profile_name": "dbt_fabric_testing", "target_name": "dev", "connection_name": "list_WH_Conformed_Silver"} */
select  name as [schema]
    from sys.schemas 

But later in the log it doesn't add the block of T-SQL code to create the schema... It does add another schema ("test") for some reason for the same database (WH_Conformed_Silver) but not for the schemas "sales" or "common"

I also was inspired and created a very simple solution without using the generate_database_name.sql or generate_schema_name.sql and was successful and only using the config settings in each model to set the database and schema to other databases and schemas than what the profile.yml was set to. And the behavior in the logs is much more consistent with what I would expect. new_successful_run.log However, I did notice that whatever I did, it added "dbo_" to the deployed schema. So, the "sales" schema turned into "dbo_sales". I'm not concerned about this problem, just bringing it up in case it is relevant in some degree. Here is some example model code:

{{
  config(
    database = 'Gold',
    schema = 'sales',
    materialized='table'
    )
}}

SELECT * FROM {{ref('my_model_1')}}

Thanks, and please let me know if you need any other information

prdpsvs commented 6 months ago

@Analyticminder ,

Note that adapter will not change this behavior as the interpretation of schema can be different for different projects. And another reason is that dbt core/dbt adapters create schemas, not the adapters.

You need to define schema at the model level to create model if it does not exist. This is the right approach. As you eluded above, please see the default behavior - https://github.com/dbt-labs/dbt-adapters/blob/fd33aafe8276051b313a3c89557b50d224bc50ed/dbt/include/global_project/macros/get_custom_name/get_custom_schema.sql#L21

`{
  config(
    database = 'Gold',
    schema = 'sales',
    materialized='table'
    )
}}

SELECT * FROM {{ref('my_model_1')}}

I am closing this issue because there is nothing much, we can do at the adapter level to change this behavior at this time. I am following with my dbt team to figure out alternatives. If anything can be done, I will re-open this issue.