microsoft / dbt-fabric

MIT License
79 stars 28 forks source link

Seed batch size calculation is faulty and setting the max_batch_size for seeds does not work #206

Closed tkiehn closed 2 months ago

tkiehn commented 4 months ago

Hello,

In my dbt-project I want to load a seed-file which is 19 columns wide and contains 200 rows of data.

dbt then returns a runtime error: ('07002', '[07002] [Microsoft][ODBC Driver 18 for SQL Server]COUNT field incorrect or syntax error (0) (SQLExecDirectW)'). This seems to be related to the product of rows and columns exceeding the 2100 parameter limit.

The error seems to be that in calc_batch_size the | int filter is only applied to the -1 and not the calculation result. After adding another set of brackets around the whole calculation the seed command works. image

Another thing I found is, that the var max_batch_size doesn't seem to be used. I have tried setting the variable when debugging the error and it did not change anything. In the fabric__get_batch_size macro 400 is returned by default and the variable is not used.

Let me know if i missed something there.

Kind Regards Theo

prdpsvs commented 2 months ago

@tkiehn , Thanks for catching the issue. I see your pull request to resolve this issue. Microsoft org policies does not let me run integration tests on forked repos outside microsoft. I am going to pull your change into v.1.8.8 branch and run tests. Hope that's okay. I am going to add you as a contributor to this release in the release log though.

max_batch_size is called in calc_batch_size macro - https://github.com/microsoft/dbt-fabric/blob/d4f99e79c3083d8584c228ce3136285c18da77de/dbt/include/fabric/macros/materializations/seeds/helpers.sql#L15

And get_batch_size macro should invoke get_batch_size in dbt-adapters and should dispatch fabric__get_batch_size - https://github.com/dbt-labs/dbt-adapters/blob/f1987d4313cc94bac9906963dff1337ee0bffbc6/dbt/include/global_project/macros/materializations/seeds/helpers.sql#L67

Overriding fabric__get_batch_size macro should update max_batch_size variable. Could you try and let me know?

tkiehn commented 2 months ago

Hi @prdpsvs and thanks for getting back at me.

By overriding fabric__get_batch_size like so:

{% macro fabric__get_batch_size() %}
    {{ return(var('max_batch_size', 400)) }}
{% endmacro %}

the variable is now considered.

The current just returns 400: https://github.com/microsoft/dbt-fabric/blob/d4f99e79c3083d8584c228ce3136285c18da77de/dbt/include/fabric/macros/materializations/seeds/helpers.sql#L6

prdpsvs commented 2 months ago

Addressed this issue in #201