When specifying glob path with corrupt files, inferring schema fails if first selected file is not NetFlow file.
Currently ignoreCorruptFiles is not applied when inferring version from files, and will fail with exception below, if selected file is not a NetFlow file.
17/02/21 16:20:51 INFO DAGScheduler: Job 9 finished: load at <console>:23, took 0.285259 s
java.io.IOException: Corrupt NetFlow file. Wrong magic number
at com.github.sadikovi.netflowlib.NetFlowReader.<init>(NetFlowReader.java:137)
at com.github.sadikovi.netflowlib.NetFlowReader.prepareReader(NetFlowReader.java:80)
Note that this works correctly when files are correct, or file to infer version is a NetFlow file, or when version is provided.
I think we should just throw proper exception saying that it cannot infer version, and it should be specified manually, or one should check if files are correct.
When specifying glob path with corrupt files, inferring schema fails if first selected file is not NetFlow file.
Currently
ignoreCorruptFiles
is not applied when inferring version from files, and will fail with exception below, if selected file is not a NetFlow file.Note that this works correctly when files are correct, or file to infer version is a NetFlow file, or when version is provided.