hortonworks-spark / spark-llap

Apache License 2.0
102 stars 68 forks source link

Add count support for branch-2.3-3.0 #242

Closed EricWohlstadter closed 6 years ago

EricWohlstadter commented 6 years ago

What changes were proposed in this pull request?

Migrate the logic from master branch that specially handles .count() action.

Instead of using parallelize, when the count value from HS2 is X:

  1. Launch COUNT_TASKS num of tasks (configurable)
  2. (COUNT_TASKS - 1) tasks generate X/(COUNT_TASKS - 1) rows.
  3. 1 task generates X % (COUNT_TASKS - 1) rows

How was this patch tested?

UT

HyukjinKwon commented 6 years ago

@EricWohlstadter, please feel free to merge it as you want. I locally verified this anyway.