spirom / LearningSpark

Scala examples for learning to use Spark
MIT License
444 stars 291 forks source link

Multiple arguments in UDAF.scala #3

Closed geenie closed 8 years ago

geenie commented 8 years ago

In UDAF.scala, it accepts only one column for input,buffer and output. For multiple input/buffer/output columns how do we proceed?

spirom commented 8 years ago

I just pushed UDAF2.scala -- this only increases the number of input parameters. I'll add one for a multi-parameter buffer when I can think of a good example, but basically you need to (a) add fields to the definition of 'bufferSchema" and (b) change the definitions of initialize(), merge() and evaluate(). It seems to me that whatever you do, evaluate() will always return just one column, but it can be of any supported type, including struct, so that shouldn't be a big problem -- you need to change the definition of 'dataType' in addition to changing the logic.

If you can think of a good example that's fairly simple and you don't mind my publishing the solution to, just let me know right here and I'll try it.

spirom commented 8 years ago

See UDAF_Multi.scala for a more comprehensive example. I think that about covers it.