mooc4spark / Spark-BerkeleyX

1 stars 7 forks source link

CS105x - Lab1a - 6d - Unclear on GroupedData object #7

Open anishanagarajan opened 7 years ago

anishanagarajan commented 7 years ago

When the groupBy() transformation is performed on a DataFrame, it says a special GroupedData object is returned. Is the dataDF object actually altered (changed to a GroupedData object) or would you have to assign the result of dataDF.groupBy() to another variable?

Additionally, all of the code examples below show the groupBy() function immediately followed by an aggregation function. Do these aggregations work similarly to actions, in that a transformation (like groupBy()) does not occur until an aggregation is performed?

jduan1 commented 7 years ago

Grouped data is not physically grouped but logically linked. You have to use aggregation functions on grouped data before you use. @anishanagarajan