microsoft / sql-spark-connector

Apache Spark Connector for SQL Server and Azure SQL
Apache License 2.0
273 stars 116 forks source link

How to pass column mappings? #226

Open scotty-tarvan opened 1 year ago

scotty-tarvan commented 1 year ago

Hello,

Im using a python synapse notebook and cant figure out how to pass column mappings to the bulk copy. We have to ensure that the dataframe schema matches the target table 100% else it fails.

Is this possible?

Sample code below.

df.write \
    .format("com.microsoft.sqlserver.jdbc.spark") \
    .option("url", f'jdbc:sqlserver://{configs["database"]["host"]}:{configs["database"]["port"]};database={configs["database"]["name"]}') \
    .option("dbtable", table) \
    .option("user", configs["database"]["login"]) \
    .option("password", configs["database"]["password"]) \
    .option("reliabilityLevel", "BEST_EFFORT") \
    .option("keepIdentity", False) \
    .option("tableLock", tableLock) \
    .option("batchSize", batchSize) \
    .option("schemaCheckEnabled", False) \
    .mode("append") \
    .save()
tonio-m commented 1 year ago

For that you would have to transform the dataframe before writing it using normal spark, like this:

column_mapping = {"col1_name": "Col1Name", "col2_name": "Col2Name"}
for old_name, new_name in column_mapping.items():
    df = df.withColumnRenamed(old_name, new_name)