lintool / warcbase

Warcbase is an open-source platform for managing analyzing web archives
http://warcbase.org/
161 stars 47 forks source link

Upgrade to Spark 1.6.1? #231

Closed ianmilligan1 closed 8 years ago

ianmilligan1 commented 8 years ago

Our instructions currently have people running everything using Spark 1.5.1. The new dynamic PageRank functionality requires Spark 1.6.1 however. We should double-check how much of our stuff is compatible (most of it, I assume, although some of the Spark Notebook integrations could get funky).

jrwiebe commented 8 years ago

I've been running 1.6.1 locally for a while, without apparent issue. I never really use Spark Notebook, so I can't speak to that, but we should update our docs to reflect the correct version numbers: https://github.com/lintool/warcbase-docs/blob/master/docs/Installing-and-Running-Spark-under-OS-X.md#running-with-spark-notebook

ianmilligan1 commented 8 years ago

Great to know, @jrwiebe! Aye – looks like Spark Notebook wants users to be on Spark 1.6 anyways. Will test and then update docs accordingly.

jrwiebe commented 8 years ago

I'd suggest this: http://spark-notebook.io/dl/zip/0.6.3/2.11/1.6.1/2.7.2/false/false . It's Notebook 0.6.3, Scala 2.11, Spark 1.6.1, Hadoop 2.7.2 (no hive, no parquet).

ianmilligan1 commented 8 years ago

Thoughts @lintool? If you think this sounds good, we can update the docs and main repo.

jrwiebe commented 8 years ago

While we're at it, there are a lot of dependencies in the POM file that could be updated as well.

[INFO] The following dependencies in Dependencies have newer versions:
[INFO]   com.chuusai:shapeless_2.10.4 ...................... 2.0.0 -> 2.2.0-RC1
[INFO]   com.fasterxml.jackson.core:jackson-core ............... 2.7.2 -> 2.7.4
[INFO]   com.fasterxml.jackson.core:jackson-databind ........... 2.7.2 -> 2.7.4
[INFO]   com.google.guava:guava ................................ 14.0.1 -> 19.0
[INFO]   com.typesafe:config ................................... 1.2.1 -> 1.3.0
[INFO]   commons-cli:commons-cli ................................. 1.2 -> 1.3.1
[INFO]   commons-codec:commons-codec .............................. 1.8 -> 1.10
[INFO]   commons-io:commons-io ..................................... 2.4 -> 2.5
[INFO]   edu.stanford.nlp:stanford-corenlp ..................... 3.4.1 -> 3.6.0
[INFO]   it.unimi.dsi:dsiutils ................................. 2.2.0 -> 2.3.3
[INFO]   it.unimi.dsi:fastutil ............................... 6.5.15 -> 7.0.12
[INFO]   org.apache.commons:commons-lang3 .......................... 3.0 -> 3.4
[INFO]   org.apache.hadoop:hadoop-client .............. 2.6.0-cdh5.4.1 -> 2.7.2
[INFO]   org.apache.hbase:hbase-client ................ 1.0.0-cdh5.4.1 -> 1.2.1
[INFO]   org.apache.hbase:hbase-server ................ 1.0.0-cdh5.4.1 -> 1.2.1
[INFO]   org.apache.lucene:lucene-core ......................... 4.7.2 -> 6.0.0
[INFO]   org.apache.solr:solr-core ............................. 4.7.2 -> 6.0.0
[INFO]   org.apache.spark:spark-core_2.10 ............ 1.3.0-cdh5.4.1 -> 99.9.9
[INFO]   org.apache.spark:spark-graphx_2.10 .......... 1.3.0-cdh5.4.1 -> 99.9.9
[INFO]   org.apache.tika:tika-core ................................ 1.9 -> 1.13
[INFO]   org.apache.tika:tika-parsers ............................. 1.9 -> 1.13
[INFO]   org.apache.zookeeper:zookeeper ......... 3.4.5-cdh5.4.1 -> 3.5.1-alpha
[INFO]   org.eclipse.jetty:jetty-server ... 8.1.12.v20130726 -> 9.3.9.v20160517
[INFO]   org.eclipse.jetty:jetty-webapp ... 8.1.12.v20130726 -> 9.3.9.v20160517
[INFO]   org.json4s:json4s-jackson_2.10 ....................... 3.2.10 -> 3.3.0
[INFO]   org.jsoup:jsoup ....................................... 1.7.3 -> 1.9.2
[INFO]   org.netpreserve.commons:webarchive-commons ............ 1.1.4 -> 1.1.6
[INFO]   org.netpreserve.openwayback:openwayback-core ... 2.0.0.BETA.2 -> 2.3.1
[INFO]   org.scala-lang:scala-library ..................... 2.10.4 -> 2.12.0-M4
[INFO]   org.scalatest:scalatest_2.10 ................... 2.2.4 -> 3.0.0-SNAP13
[INFO]   org.seleniumhq.selenium:selenium-java ............... 2.42.2 -> 2.53.0
[INFO]   org.slf4j:slf4j-log4j12 .............................. 1.6.4 -> 1.7.21
[INFO]   org.xerial.snappy:snappy-java ....................... 1.0.5 -> 1.1.2.4
[INFO]   uk.bl.wa.discovery:warc-hadoop-indexer ...
[INFO]                                             2.2.0-BETA-5 -> 2.2.0-BETA-6
lintool commented 8 years ago

We'll be upgrading to Spark 1.6.0, which is part of CDH 5.7.1, see also Issue #236

lintool commented 8 years ago

Completed. commit 8ba16e842c299c2955bedb1062b3ed8f1aa95190