Open sjyk opened 9 years ago
This is actually hard to do, since the current code applies a distinct count first and then runs attrdedup
Hm. Could we rewrite the initial count distinct query as a group by?
e.g. SELECT name, first(col1), first(col2), ... FROM t GROUP BY name
This requires spark SQL to have a first
aggregate, or some other way of getting a value out of the group.
Include other cols in the task.