storebrand / target-mssql

Singer.io compatible Target for SQL Server. Created with the Meltano SDK.
Apache License 2.0
3 stars 17 forks source link

Conforming Names issue #19

Open visch opened 1 year ago

visch commented 1 year ago

I have a key property with two fields of of Unique Record ID , and _smart_source_bucket in my source data. The initial table get's created with the "correct" name of unique_record_id but the merge statement gets messed up and looks like this

MERGE INTO WIDA.wida AS target
            USING WIDA.#wida AS temp
            ON temp.unique _record id = target.unique _record id and temp._smart_source_bucket = target._smart_source_bucket
            WHEN MATCHED THEN

Notice temp.unique _record id isn't correct.

I honestly would much rather not have the names be conformed at all other than swapping any -'s for _'s , because I can imagine we're going to hit N number of these kind of issues I'd rather not deal with!

hholgersen commented 1 year ago

Yeah... that was definetely not supposed to be that way, basically I misread a regex and didn't have tests to cover it. I have made a new branch, https://github.com/storebrand/target-mssql/tree/update_conform_name, that I believe fixes it. If you have time to check it out that would be great.

I see what you mean about minimal editing of column names, I guess I am colored by my own use cases where "standard" names are required. I might add a couple of config options for this though, like minimal, snake_lowercase and snake_uppercase, with snake_lowercase as the default.

visch commented 1 year ago

Yeah... that was definetely not supposed to be that way, basically I misread a regex and didn't have tests to cover it. I have made a new branch, https://github.com/storebrand/target-mssql/tree/update_conform_name, that I believe fixes it. If you have time to check it out that would be great.

I see what you mean about minimal editing of column names, I guess I am colored by my own use cases where "standard" names are required. I might add a couple of config options for this though, like minimal, snake_lowercase and snake_uppercase, with snake_lowercase as the default.

Minimal would be great, some other use cases I have that I currently use my own target-mssql target for have requirements on keeping the table and column names as close as possible to the source system so this would help me later as well if I ever migrate it!

Note that if you go with Minimal you'll need to wrap all the columns with quotes as spaces and things will break the current sql.

hholgersen commented 1 year ago

After a few attempts at creating table and column names, I have had very little success at generating errors. The only character I don't think I have been able to generate an error on characters other than square brackets[]. I am amazed this works, even in SQL Server 2017.

CREATE TABLE [dbo].[Test - Table !%!] (
    [%] INT,
    [¤] INT,
    [""""] INT
);

For better or worse, this actually makes allowlisting a pain.

If you have some code for minimal naming conversion I could borrow @visch, that would be great.

visch commented 1 year ago

@hholgersen I was thinking really simple like

    def conform_name(self, name: str, object_type: Optional[str] = None) -> str:
        """Conform a stream property name to one suitable for the target system.
        Transforms names to snake case by default, applicable to most common DBMSs'.
        Developers may override this method to apply custom transformations
        to database/schema/table/column names.
        Args:
            name: Property name.
            object_type: One of ``database``, ``schema``, ``table`` or ``column``.
        Returns:
            The name transformed to snake case.
        """
        return name.replace("-","_")

Is that what you were asking for @hholgersen ?

radbrt commented 1 year ago

@visch I think at least two of the 9 forks have implemented this column name preservation now, so I made an attempt at a general one. It passes the unit tests, do you care to give it a go? https://github.com/radbrt/target-mssql/tree/orig-colnames

visch commented 1 year ago

@visch I think at least two of the 9 forks have implemented this column name preservation now, so I made an attempt at a general one. It passes the unit tests, do you care to give it a go? https://github.com/radbrt/target-mssql/tree/orig-colnames

I think you still have to replace dashes' with _'s, and with that idea maybe a "name_conforming_strategy" makes more sense as a config? I'm sure that repo would work, I moved on and used something else for this project but I'll be back to this at some point with another thing