microsoft / sql-spark-connector

Apache Spark Connector for SQL Server and Azure SQL
Apache License 2.0
273 stars 118 forks source link

Saving a column type of tinyint results in an error #228

Open Malcatrazz opened 1 year ago

Malcatrazz commented 1 year ago

Hi there,

We are writing a dataframe to SQL server using:

(source.write
    .format('com.microsoft.sqlserver.jdbc.spark')
    .mode('overwrite')
    .option('url', f'jdbc:sqlserver://{server};databaseName={database};encrypt=true;trustServerCertificate=true')
    .option('dbtable', tablename)
    .option('sslProtocol', 'TLSv1.2')
    .option('batchsize', 10000)
    .option('tableLock', 'true')
    .option('user', user)
    .option('password', pw)
    .save()
)

We get the error "Cannot find data type BYTE." This is because the dataframe has a tinyint column which is apparently mapped to 'BYTE'. SQL Server has a column type of tinyint but not byte.

It's easy to circumvent this by changing the datatype before saving it to e.g. an int.

for col_name, col_type in source.dtypes:
    if col_type == 'tinyint':
        source = source.withColumn(col_name, source[col_name].cast('int'))

Cheers!