meltano / sdk

Write 70% less code by using the SDK to build custom extractors and loaders that adhere to the Singer standard: https://sdk.meltano.com
https://sdk.meltano.com
Apache License 2.0
99 stars 70 forks source link

bug: Default target schema vs schema #2766

Open mjsqu opened 4 days ago

mjsqu commented 4 days ago

Singer SDK Version

0.42.1

Is this a regression?

Python Version

3.11

Bug scope

Targets (data type handling, batching, SQL object generation, etc.)

Operating System

No response

Description

While using the MeltanoLabs variant of target-snowflake, I had records coming from SQL Server with a schema as part of the stream ID:

{
  "type": "RECORD",
  "stream": "TAP_SCHEMA-users",
  "time_extracted": "2017-11-20T16:45:33.000Z",
  "record": {
    "id": 0,
    "name": "Chris"
  }
}

i.e. the schema for this record is TAP_SCHEMA

I setup the target snowflake configuration with a different schema:

{
    "schema": "TARGET_SCHEMA"
}

But crucially, I did not set default_target_schema. The following code executes during sink processing and if default_target_schema is not found, then the schema name is derived from the incoming stream IDs.

https://github.com/meltano/sdk/blob/344a2184b1bdcb28eb3b9481363e96940e1d61e6/singer_sdk/sinks/sql.py#L88-L105

In my scenario the account connecting to Snowflake did not have access to the schema named TAP_SCHEMA (from the original tap metadata messages), so when it went to create a File Format in that location it was not permitted to. I can't recall if it would then try to create a table in that same schema.

Possible fix:

The ambiguity here is a bit difficult, if both schema and default_target_schema are set, then which should be prioritised (probably the latter as it's the existing functionality, leading to less breaking change).

Code

Link to Slack/Linen

https://meltano.slack.com/archives/C069RH0F95F/p1731616264361179

mjsqu commented 4 days ago

https://github.com/MeltanoLabs/target-snowflake/discussions/92#discussioncomment-11259580