sadikovi / spark-netflow

NetFlow data source for Spark SQL and DataFrames
Apache License 2.0
18 stars 11 forks source link

[WIP] Whole stage codegen (attempt 2) #35

Closed sadikovi closed 8 years ago

sadikovi commented 8 years ago

This PR adds support for code generation similar to Spark whole stage codegen. This time we generate code for library including full scan and predicate scan.

It is still work in progress.

codecov-io commented 8 years ago

Current coverage is 92.39%

Merging #35 into master will decrease coverage by 3.65%

  1. 2 files (not in diff) in .../spark/netflow/index were modified. more
  2. 1 files (not in diff) in ...dikovi/spark/netflow were modified. more
    • Misses +2
    • Hits -2
  3. 1 files (not in diff) in ...ithub/sadikovi/spark were modified. more
  4. File ...ark/util/Utils.scala was modified. more
@@             master        #35   diff @@
==========================================
  Files            20         21     +1   
  Lines           858        893    +35   
  Methods         778        818    +40   
  Messages          0          0          
  Branches         80         75     -5   
==========================================
+ Hits            824        825     +1   
- Misses           34         68    +34   
  Partials          0          0          

Powered by Codecov. Last updated by b1b1d45...2e35d82

sadikovi commented 8 years ago

Here is a benchmark for hand-written code compare to current implementation:

Intel(R) Core(TM) i5-4258U CPU @ 2.40GHz
NetFlow codegen report:             Best/Avg Time(ms)    Rate(M/s)   Per Row(ns)   Relative
-------------------------------------------------------------------------------------------
Project=7, predicate=2, codegen=F         604 /  702       1655.2       60415.5       1.0X
Project=7, predicate=2, codegen=T         515 /  539       1942.5       51480.2       1.2X

At most gives 20% improvement, on other hand maintenance complexity and we have to build project for Spark 1.5.x-1.6.x. Guess, I am closing for now.