[spark-sql] Schema measurement seems to result duplicate columns

sodadata / soda-sql

Soda SQL and Soda Spark have been deprecated and replaced by Soda Core. docs.soda.io/soda-core/overview.html

https://docs.soda.io/

Apache License 2.0

61 stars 16 forks source link

[spark-sql] Schema measurement seems to result duplicate columns #124

Open vijaykiran opened 3 years ago

vijaykiran commented 3 years ago

Schema measurement seems to result in duplicate columns when the hive table is partitioned by same column, e.g. if a hive table contains column_a and partitioned by column_a, schema measurement is treating them as two columns ending up with duplicate column names.

JCZuurmond commented 3 years ago

PR sodadata/soda-core#503 solves this issue for the Spark dialect, maybe we should copy that implementation for the hive dialect too.