Plotting a Histogram in spark-shell tutorial does not work

spmp commented 7 years ago

Following the tutorial http://histogrammar.org/docs/tutorials/scala-spark-bokeh/ specifically Plotting a Histogram in spark-shell does not work. A required import is missing: import org.dianahep.histogrammar.tutorial.cmsdata.Muon the save(myfirstplot,"myfirstplot.html") results in an error:

/libs/functional/FunctionalCanBuild;                                                                                                                
  at io.continuum.bokeh.JSONSerializer.<init>(Serializer.scala:8)
  at io.continuum.bokeh.HTMLFragmentWriter.<init>(Document.scala:54)
  at io.continuum.bokeh.HTMLFileWriter.<init>(Document.scala:126)
  at io.continuum.bokeh.HTMLFileWriter$.apply(Document.scala:122)
  at io.continuum.bokeh.Document.save(Document.scala:22)
  at io.continuum.bokeh.Document.save(Document.scala:23)
  at io.continuum.bokeh.Document.save(Document.scala:26)
  at org.dianahep.histogrammar.bokeh.package$.save(bokeh.scala:120)
  ... 52 elided

Following the advice

Users are strongly encouraged to learn the syntax of Bokeh package, especially about Glyph and Plot abstractions

I am wading through the Bokeh API docs 8)

jpivarski commented 7 years ago

Looking at that tutorial, we should probably

import org.dianahep.histogrammar.tutorial.cmsdata._

and drop cmsdata. qualifiers. There are unqualified uses of other classes from that package, such as Jet.

ASvyatkovskiy commented 7 years ago

@spmp Which version of Spark do you use? After inserting the missing import:

import org.dianahep.histogrammar.tutorial.cmsdata._

I am not able to reproduce the error with both Spark 2.1.0 (Scala 2.11) and Spark 1.6.1 (Scala 2.10)

jpivarski commented 7 years ago

That's right— he was just telling us we were missing that line.

spmp commented 7 years ago

@ASvyatkovskiy I have tried this in spark 2.0.0 and 2.1.1 (Scala 2.11), both from normal spark-shell, and from a spark-notebook built and run in a chroot to ensure no other mvn etc/system/java issues in Ubuntu 14.04 using Oracle Java 8 Exactly the same stack trace. Happy to keep trying if you can give me some other things to test to make it work.

ASvyatkovskiy commented 7 years ago

@spmp Are you installing histogrammar from source or via --packages from maven cenral? Here are the details of my test (on a RHEL 6 system, but that should not matter for this case):

$ java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
$ spark-shell --packages "org.diana-hep:histogrammar-bokeh_2.11:1.0.3"

Then in spark-shell:

Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 2.1.0
      /_/

Using Scala version 2.11.8 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.

scala> import org.dianahep.histogrammar.tutorial.cmsdata
import org.dianahep.histogrammar.tutorial.cmsdata

scala> val events = cmsdata.EventIterator()
events: org.dianahep.histogrammar.tutorial.cmsdata.EventIterator = non-empty iterator

scala> val dataset_rdd = sc.parallelize(events.toSeq)
dataset_rdd: org.apache.spark.rdd.RDD[org.dianahep.histogrammar.tutorial.cmsdata.Event] = ParallelCollectionRDD[0] at parallelize at <console>:27

scala> import org.dianahep.histogrammar.tutorial.cmsdata._
import org.dianahep.histogrammar.tutorial.cmsdata._

scala> import org.dianahep.histogrammar._
import org.dianahep.histogrammar._

scala> import org.dianahep.histogrammar.bokeh._
import org.dianahep.histogrammar.bokeh._

scala> val muons_rdd = dataset_rdd.flatMap(_.muons).filter(_.pz > 2.0)
muons_rdd: org.apache.spark.rdd.RDD[org.dianahep.histogrammar.tutorial.cmsdata.Muon] = MapPartitionsRDD[2] at filter at <console>:38

scala> val p_histogram = Histogram(100, 0, 200, {mu: Muon => math.sqrt(mu.px*mu.px + mu.py*mu.py + mu.pz*mu.pz)})
p_histogram: org.dianahep.histogrammar.Selecting[org.dianahep.histogrammar.tutorial.cmsdata.Muon,org.dianahep.histogrammar.Binning[org.dianahep.histogrammar.tutorial.cmsdata.Muon,org.dianahep.histogrammar.Counting,org.dianahep.histogrammar.Counting,org.dianahep.histogrammar.Counting,org.dianahep.histogrammar.Counting]] = <Selecting cut=Bin>

scala> val final_histogram = muons_rdd.aggregate(p_histogram)(new Increment, new Combine)
final_histogram: org.dianahep.histogrammar.Selecting[org.dianahep.histogrammar.tutorial.cmsdata.Muon,org.dianahep.histogrammar.Binning[org.dianahep.histogrammar.tutorial.cmsdata.Muon,org.dianahep.histogrammar.Counting,org.dianahep.histogrammar.Counting,org.dianahep.histogrammar.Counting,org.dianahep.histogrammar.Counting]] = <Selecting cut=Bin>

scala> val myfirstplot = final_histogram.bokeh().plot()
myfirstplot: io.continuum.bokeh.Plot = io.continuum.bokeh.Plot@34c31a34

scala> save(myfirstplot,"myfirstplot.html")
Wrote myfirstplot.html. Open file:///home/alexeys/Test/myfirstplot.html in a web browser.

Let us know if it helps.

spmp commented 7 years ago

Yes I was installing via --package OK, I did this with spark 2.1.0 and spark 2.1.1 and versions 1.0.3 and 1.0.4 in my Ubuntu 14.04 chroot with Java 1.8.0_141 and it worked fine. Also works fine outside chroot with Java 1.8.0_92. OK just plain weird. Could have been a conflict with another imported jar. No idea why its not working from spark-notebook could it be the play framework version. Which do you use?

histogrammar / histogrammar-docs

Plotting a Histogram in spark-shell tutorial does not work #26