Closed sadikovi closed 7 years ago
Merging #61 into master will decrease coverage by
-2.41%
. The diff coverage is99.22%
.
@@ Coverage Diff @@
## master #61 +/- ##
==========================================
- Coverage 95.94% 93.54% -2.41%
==========================================
Files 21 12 -9
Lines 913 418 -495
Branches 140 32 -108
==========================================
- Hits 876 391 -485
+ Misses 37 27 -10
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update ce2764e...6fa47b8. Read the comment docs.
Closes #56, #60, #47.
This PR adds support for Spark 2.x (specifically any 2.0.x and 2.1.x). Done as subclass of
FileFormat
, without write support. Build file is updated to test all target Spark 2.x versions. This work will close some of the issues, e.g. usingInternalRow
instead ofRow
and refactoring RDD methods.Since datasource API has changed significantly, some relevant files have been removed, such as
NetFlowRDD
orNetFlowFileStatus
. We also do not have control over partitioning, so this feature is removed too. Statistics on columns are also removed (except header information about time range).Options that are left:
version
buffer
stringify
predicate-pushdown
spark.files.ignoreCorruptFiles
as conf option forSparkSession
See updated README for more information.