sadikovi / spark-netflow

NetFlow data source for Spark SQL and DataFrames
Apache License 2.0
18 stars 11 forks source link

Option to ignore corrupt files #58

Closed sadikovi closed 7 years ago

sadikovi commented 7 years ago

This PR updates NetFlowFileRDD to respect Spark option spark.files.ignoreCorruptFiles. When this Spark option is true, files that are corrupt or not NetFlow files are ignored. If file partially corrupt, then only recoverable data is read (up to corrupted block), if reader fails to initialize, then empty iterator is returned from that file.

sadikovi commented 7 years ago

Should add tests for enabling option. I am also thinking about pushing it to the flow library instead.

codecov-io commented 7 years ago

Current coverage is 95.03% (diff: 73.52%)

Merging #58 into master will decrease coverage by 0.89%

@@             master        #58   diff @@
==========================================
  Files            21         21          
  Lines           908        926    +18   
  Methods         770        781    +11   
  Messages          0          0          
  Branches        138        145     +7   
==========================================
+ Hits            871        880     +9   
- Misses           37         46     +9   
  Partials          0          0          

Powered by Codecov. Last update 9d405a6...fe6b75f