zrlio / crail-spark-io

Fast I/O plugins for Spark
Apache License 2.0
41 stars 14 forks source link

Hibench-pagerank: java.io.StreamCorruptedException: invalid type code: AC #3

Open IMCG opened 7 years ago

IMCG commented 7 years ago

Hello, A problem confusing me a lot. I believe it is a code bug of Crail When running with Hibench-pagerank with 3 iterations. The stage 0 is fine, as the stage 1 starts, the problem pops out: java.io.StreamCorruptedException: invalid type code: AC; the debug info is as follows: WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 40, 172.18.0.17): java.io.StreamCorruptedException: invalid type code: AC at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1381) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) at org.apache.spark.serializer.DeserializationStream.readKey(Serializer.scala:157) at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:189) at org.apache.spark.serializer.DeserializationStream$$anon$2.getNext(Serializer.scala:186) at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73) at org.apache.spark.shuffle.crail.CrailInputCloser.hasNext(CrailInputCloser.scala:33) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at org.apache.spark.shuffle.crail.CrailShuffleWriter.write(CrailShuffleWriter.scala:65) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:79) at org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:47) at org.apache.spark.scheduler.Task.run(Task.scala:85) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:274) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

How to make thing right?

patrickstuedi commented 7 years ago

Hi, Thanks for pointing this out.

There is a current issue with the non-blocking feature in CrailMultistream that could be causing this problem. We are working on a solution for that. One thing you could try if going back to commit dbe447fb7d221e4846d6d303cbaed6d94068f0b0 for Crail fixes the problem, if not it's something else.

Also, please post this on the crail forum. The forum is the best place to discuss issues and problems related to Crail.

https://groups.google.com/forum/#!forum/zrlio-users or email: zrlio-users@googlegroups.com

Thanks!