spark-redshift-community / spark-redshift

Performant Redshift data source for Apache Spark
Apache License 2.0
135 stars 62 forks source link

Support reading from SUPER columns #105

Closed matthewrj closed 2 years ago

matthewrj commented 2 years ago

Why

Reading from tables with columns of SUPER type results in the error "Unsupported type -1". This is because the SQL type for SUPER columns presents as LONGVARCHAR and there is no mapping for LONGVARCHAR to a catalyst type.

Using JSON_SERIALIZE to cast the SUPER column into VARCHAR in a query does not always work as the size limit for SUPER is much larger than the size limit for VARCHAR and so JSON_SERIALIZE returns an error for large SUPER values.

How

Add catalyst type mapping for LONGVARCHAR.

Testing

I've manually tested this for my use case and I can successfully read SUPER data.

jsleight commented 2 years ago

Thanks for the PR! We have another ongoing PR so will probably wait to release a new patch version until that PR gets merged as well.