plaa / mongo-spark

Example application on how to use mongo-hadoop connector with Spark
90 stars 58 forks source link

mongo-spark

Example application on how to use mongo-hadoop connector with Apache Spark.

Read more details at http://codeforhire.com/2014/02/18/using-spark-with-mongodb/

Prerequisites

Running

Import data into the database, run either JavaWordCount or ScalaWordCount and print the results.

mongoimport -d beowulf -c input beowulf.json
sbt 'run-main JavaWordCount'
sbt 'run-main ScalaWordCount'
mongo beowulf --eval 'printjson(db.output.find().toArray())' | less

License

The code itself is released to the public domain according to the Creative Commons CC0.

The example files are based on Beowulf from Project Gutenberg and is under its corresponding license.