Closed SathishKumarAna closed 3 years ago
You may want to read this discussion https://issues.apache.org/jira/browse/SPARK-18105. It seems to me that there is an issue in the way Spark shuffles and merges data.
Would you still need help for this?
Closing this. Please reopen it if you need help.
I am trying to join 4 tables in pyspark but i am getting error as java.io.IOException: Stream is corrupted. I made some configuration changes then also the same error is getting reported.
Currently the LZ4 Compression code i am using is lz4-java-1.5.0. What might be the issue of stream is getting corrupted.
Py4JJavaError: An error occurred while calling o453.count. : org.apache.spark.SparkException: Job aborted due to stage failure: Aborting TaskSet 27.0 because task 9 (partition 9) cannot run anywhere due to node and executor blacklist. Most recent failure: Lost task 9.1 in stage 27.0 (TID 267, si-159l.de.se.com, executor 17): java.io.IOException: Stream is corrupted at net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:202) at net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:228) at net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:157) at org.apache.spark.io.ReadAheadInputStream$1.run(ReadAheadInputStream.java:168) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
Blacklisting behavior can be configured via spark.blacklist.*.