Closed spmp closed 3 years ago
Looking at that tutorial, we should probably
import org.dianahep.histogrammar.tutorial.cmsdata._
and drop cmsdata.
qualifiers. There are unqualified uses of other classes from that package, such as Jet
.
@spmp Which version of Spark do you use? After inserting the missing import:
import org.dianahep.histogrammar.tutorial.cmsdata._
I am not able to reproduce the error with both Spark 2.1.0 (Scala 2.11) and Spark 1.6.1 (Scala 2.10)
That's right— he was just telling us we were missing that line.
@ASvyatkovskiy I have tried this in spark 2.0.0 and 2.1.1 (Scala 2.11), both from normal spark-shell, and from a spark-notebook built and run in a chroot to ensure no other mvn etc/system/java issues in Ubuntu 14.04 using Oracle Java 8 Exactly the same stack trace. Happy to keep trying if you can give me some other things to test to make it work.
@spmp Are you installing histogrammar from source or via --packages
from maven cenral? Here are the details of my test (on a RHEL 6 system, but that should not matter for this case):
$ java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
$ spark-shell --packages "org.diana-hep:histogrammar-bokeh_2.11:1.0.3"
Then in spark-shell:
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 2.1.0
/_/
Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.
scala> import org.dianahep.histogrammar.tutorial.cmsdata
import org.dianahep.histogrammar.tutorial.cmsdata
scala> val events = cmsdata.EventIterator()
events: org.dianahep.histogrammar.tutorial.cmsdata.EventIterator = non-empty iterator
scala> val dataset_rdd = sc.parallelize(events.toSeq)
dataset_rdd: org.apache.spark.rdd.RDD[org.dianahep.histogrammar.tutorial.cmsdata.Event] = ParallelCollectionRDD[0] at parallelize at <console>:27
scala> import org.dianahep.histogrammar.tutorial.cmsdata._
import org.dianahep.histogrammar.tutorial.cmsdata._
scala> import org.dianahep.histogrammar._
import org.dianahep.histogrammar._
scala> import org.dianahep.histogrammar.bokeh._
import org.dianahep.histogrammar.bokeh._
scala> val muons_rdd = dataset_rdd.flatMap(_.muons).filter(_.pz > 2.0)
muons_rdd: org.apache.spark.rdd.RDD[org.dianahep.histogrammar.tutorial.cmsdata.Muon] = MapPartitionsRDD[2] at filter at <console>:38
scala> val p_histogram = Histogram(100, 0, 200, {mu: Muon => math.sqrt(mu.px*mu.px + mu.py*mu.py + mu.pz*mu.pz)})
p_histogram: org.dianahep.histogrammar.Selecting[org.dianahep.histogrammar.tutorial.cmsdata.Muon,org.dianahep.histogrammar.Binning[org.dianahep.histogrammar.tutorial.cmsdata.Muon,org.dianahep.histogrammar.Counting,org.dianahep.histogrammar.Counting,org.dianahep.histogrammar.Counting,org.dianahep.histogrammar.Counting]] = <Selecting cut=Bin>
scala> val final_histogram = muons_rdd.aggregate(p_histogram)(new Increment, new Combine)
final_histogram: org.dianahep.histogrammar.Selecting[org.dianahep.histogrammar.tutorial.cmsdata.Muon,org.dianahep.histogrammar.Binning[org.dianahep.histogrammar.tutorial.cmsdata.Muon,org.dianahep.histogrammar.Counting,org.dianahep.histogrammar.Counting,org.dianahep.histogrammar.Counting,org.dianahep.histogrammar.Counting]] = <Selecting cut=Bin>
scala> val myfirstplot = final_histogram.bokeh().plot()
myfirstplot: io.continuum.bokeh.Plot = io.continuum.bokeh.Plot@34c31a34
scala> save(myfirstplot,"myfirstplot.html")
Wrote myfirstplot.html. Open file:///home/alexeys/Test/myfirstplot.html in a web browser.
Let us know if it helps.
Yes I was installing via --package
OK, I did this with spark 2.1.0 and spark 2.1.1 and versions 1.0.3 and 1.0.4 in my Ubuntu 14.04 chroot with Java 1.8.0_141
and it worked fine. Also works fine outside chroot with Java 1.8.0_92
. OK just plain weird. Could have been a conflict with another imported jar.
No idea why its not working from spark-notebook
could it be the play framework version. Which do you use?
Following the tutorial http://histogrammar.org/docs/tutorials/scala-spark-bokeh/ specifically Plotting a Histogram in spark-shell does not work. A required import is missing:
import org.dianahep.histogrammar.tutorial.cmsdata.Muon
thesave(myfirstplot,"myfirstplot.html")
results in an error:Following the advice
I am wading through the Bokeh API docs 8)