Open tombaeyens opened 4 months ago
SAS-3465
Potential fix:
In pyspark one can do this: partitions_columns = [col.name for col in spark.catalog.listColumns("schema_name.table_name") if col.isPartition] and non_paritions_columns = [col.name for col in spark.catalog.listColumns("schema_name.table_name") if not col.isPartition]
(source: https://stackoverflow.com/questions/51540906/how-to-get-the-hive-partition-column-name-using-spark )
Potentially it's suffice to apply the fix:
and
We were testing the schema validation with the Databricks connection, and we found a problem with partitioned tables. SODA uses the columns # Partition Information and # col_name for the validation (check the first image). We think this happens because of the table's describe (second image) Is there anything that we can change on our side like a setting? Or is it a bug on SODA side that needs to be fixed?
The info for partition is irrelevant because the column appears in the first list and then in the partition information.