microsoft / dbt-fabric

MIT License
79 stars 28 forks source link

Adapter attempting to use NVARCHAR(4000) despite not being supported #154

Closed glsdown closed 5 months ago

glsdown commented 7 months ago

I am trying to build a model using dbt-fabric. I am allowing dbt to pick the data types (no use of cast etc), but I am running into this error:

('42000', "[42000] [Microsoft][ODBC Driver 18 for SQL Server][SQL Server]The data type 'nvarchar(4000)' is not supported in this edition of SQL Server. (24574) (SQLMoreResults)")

As noted in the docs, NVARCHAR is not supported by Fabric, so shouldn't be something used within dbt. It seems to be introduced by this function:

replace(my_field, nchar(160), ' ')

If replaced with the following, the error doesn't occur:

cast(replace(my_field, nchar(160), ' ') as varchar(2048))

In case it's relevant, I initially tried to use just char within this function instead of nchar but char(160) kept returning NULL.

prdpsvs commented 7 months ago

@glsdown , Please share the model definition or a sample model to repro the error.

glsdown commented 7 months ago

@prdpsvs here is a very simple example of it failing. I have included the initial cast to mimic it loading 'valid' varchar fields.

{{
    config(
        materialized='table'
    )
}}

with source as (

    select
        1 as id
        , cast('a' as varchar(2048)) as my_field
    union all
    select
        2 as id
        , cast('b' as varchar(2048)) as my_field
    union all
    select
        3 as id
        , cast('c' as varchar(2048)) as my_field

)

select
    id
    , replace(my_field, nchar(160), ' ') as my_field # causes the error
from source
prdpsvs commented 7 months ago

@glsdown , yes, replace(my_field, nchar(160), ' ') as my_field will result in failure.

{{
    config(
        materialized='table'
    )
}}
with source as (
    select
        1 as id
        , 'cast('a' as varchar(2048))' as my_field
    union all
    select
        2 as id
        , cast('b' as varchar(2048)) as my_field
    union all
    select
        3 as id
        , cast('c' as varchar(2048)) as my_field

)
select
    id
    , my_field
from source

Above code worked for me. Did it not work for you? Fabric does not support nchar as well.

glsdown commented 7 months ago

@prdpsvs yes that code works, but is not helpful to the situation.

My source data has char(160) values in it which I need to remove - hence my use of replace. I have not included it for no reason. The example is just to show that it is trying to use incorrect data types. I cannot provide the source data for the demonstration purposes.

Using char(160) does not work - in Fabric this always returns null. The only option to replace the values is to use nchar(160) but that causes the adapter to select the wrong data type for the field.

In every other dbt adapter I have used, it gracefully handles data types and never attempts to use unsupported ones.

prdpsvs commented 6 months ago

Adapter creates a temporary view on the model and then CTAS view to a table.

{{
    config(
        materialized='table'
    )
}}
with source as (
    select
        1 as id
        , 'cast('a' as varchar(2048))' as my_field
    union all
    select
        2 as id
        , cast('b' as varchar(2048)) as my_field
    union all
    select
        3 as id
        , cast('c' as varchar(2048)) as my_field

)
select
    id
    , my_field
from source

First the adapter creates a view on the model.

EXEC('create view "dbo"."my_model_temp_view" as with source as (

    select
        1 as id
        , cast(''a'' as varchar(2048)) as my_field
    union all
    select
        2 as id
        , cast(''b'' as varchar(2048)) as my_field
    union all
    select
        3 as id
        , cast(''c'' as varchar(2048)) as my_field

)

select
    id
    , replace(my_field, nchar(160), '' '') as my_field
from source;');

Then it creates a table from a view using CTAS.

EXEC(
'CREATE TABLE [dbt-1.8test].[dbo].[my_model] AS (SELECT * FROM [dbt-1.8test].[dbo].[my_model_temp_view]);'
);

The issue in Fabric DW is that when static data is wrapped in a select and created as view for above model, the view internally does not store data on the file system in parquet format and can handle nchar and nvarchar data types. If you observe, the data type of my_field is nvarchar(4000).

image

As soon as CTAS executes, the data has to be stored on a file system in parquet format. Parquet currently does not support nvarchar and nchar data types (known limitation). For this reason, you are seeing the error.

('42000', "[42000] [Microsoft][ODBC Driver 18 for SQL Server] [SQL Server] The data type 'nvarchar(4000)' is not supported in this edition of SQL Server. (24574) (SQLMoreResults)")

Hope this helps to understand why you percieve this is a dbt issue. This is underlying platform issue which cannot support above model.

Though above model did not work for you, the real scenario to replace nonbreaking spaces with regular spaces can be achieved in Fabric using following example.

CREATE TABLE YourTable (YourColumn VARCHAR(100))

INSERT INTO YourTable (YourColumn)
VALUES (' ')
INSERT INTO YourTable (YourColumn)
VALUES (nchar(160))

Select OriginalASCIIValue, ASCII(AfterASCIIValue) AS New_AfterASCIIValue FROM 
(
    SELECT ASCII(YourColumn)  AS OriginalASCIIValue , REPLACE(YourColumn, nchar(160), ' ') AS AfterASCIIValue FROM YourTable
) AS T

The above select will produce following result. image

You can apply REPLACE(YourColumn, nchar(160), ' ') on your column data on an existing table or table with constraints but adapter internally creates a view before inserting or CTAS data to a table. image

We want the Fabric DW to handle these types of issues, rather than creating a work around. This change would impact table materialization, which is a big change. Let me give some thought and consult a few team members before we decide to make this change happen.

@dataders , Any thoughts on handling this issue?