spark-root / laurelin

Allows reading ROOT TTrees into Apache Spark as DataFrames
BSD 3-Clause "New" or "Revised" License
10 stars 4 forks source link

java.lang.IllegalArgumentException in slice function of RawArray.java #92

Open tomcornelis opened 4 years ago

tomcornelis commented 4 years ago

I am trying to use laurelin, but even this minimal example results in a crash I can't explain.

This is the example:

#!/usr/bin/env python3
from pyspark.sql import SparkSession

local_jars = ','.join([
    './laurelin-1.1.1.jar',
    './log4j-api-2.13.0.jar',
    './log4j-core-2.13.0.jar',
])

spark = SparkSession\
    .builder\
    .appName("TnP")\
    .config("spark.jars", local_jars)\
    .config("spark.driver.extraClassPath", local_jars)\
    .config("spark.executor.extraClassPath", local_jars)\
    .getOrCreate()

sc = spark.sparkContext
print(sc.getConf().toDebugString())
rootfile = spark.read.format("root").option('tree', 'tpTree/fitter_tree').load('hdfs://analytix/user/tomc/tnpTuples_muons/MC_Moriond17_DY_tranch4Premix_part10_0.root')
rootfile.printSchema()
rootfile.show(1) # crash

spark.stop()

resulting in the following crash:

java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException
    at edu.vanderbilt.accre.laurelin.array.ArrayBuilder.getArray(ArrayBuilder.java:202)
    at edu.vanderbilt.accre.laurelin.spark_ttree.TTreeColumnVector.getFloats(TTreeColumnVector.java:188)
    at edu.vanderbilt.accre.laurelin.spark_ttree.TTreeColumnVector.getFloat(TTreeColumnVector.java:106)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:255)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:858)
    at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$24.apply(RDD.scala:858)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
    at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:346)
    at org.apache.spark.rdd.RDD.iterator(RDD.scala:310)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
    at org.apache.spark.scheduler.Task.run(Task.scala:123)
    at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
    at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.util.concurrent.ExecutionException: java.lang.IllegalArgumentException
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at edu.vanderbilt.accre.laurelin.array.ArrayBuilder.getArray(ArrayBuilder.java:198)
    ... 23 more
Caused by: java.lang.IllegalArgumentException
    at java.nio.Buffer.limit(Buffer.java:275)
    at edu.vanderbilt.accre.laurelin.array.RawArray.slice(RawArray.java:29)
    at edu.vanderbilt.accre.laurelin.interpretation.AsDtype.fromroot(AsDtype.java:169)
    at edu.vanderbilt.accre.laurelin.array.ArrayBuilder.processBasket(ArrayBuilder.java:90)
    at edu.vanderbilt.accre.laurelin.array.ArrayBuilder.lambda$new$0(ArrayBuilder.java:185)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    ... 3 more

The test file which is causing this issue can be found at http://tomc.web.cern.ch/tomc/MC_Moriond17_DY_tranch4Premix_part10_0.root

PerilousApricot commented 4 years ago

Hi Tom,

The offending issue is here:

    at java.nio.Buffer.limit(Buffer.java:275)
    at edu.vanderbilt.accre.laurelin.array.RawArray.slice(RawArray.java:29)

where I try to take a slice of a java bytebuffer. The function limit() only supports numbers between 0 and 2GB, so this symptom is usually due to me decoding the ROOT metadata improperly and ending up trying to request to read 100s of exabytes. I'll take a look at your file and see where the explosion happened.

PerilousApricot commented 4 years ago

@tomcornelis FWIW, Is the UserData subdir in your file something useful to deserialize, or is it just metadata?

tomcornelis commented 4 years ago

@PerilousApricot The UserData is not needed for me, it's just metadata which comes from the tree where I skimmed this subset from.

PerilousApricot commented 4 years ago

Ah, okay. The issue is that the ROOT file was somehow not closed properly, so the accounting for how many events are stored in the last basket Is stored slightly differently. Let me reverse-engineer how ROOT handles it, and run some tests

PerilousApricot commented 4 years ago

Just to update this. I've nearly got this fixed in a robust way. It's taken a bit longer than I had hoped because doing the proper fix involved a bit of a refactor to get the right information loaded at the proper time. I hope to get this cut by tomorrow or friday