sekruse / rheem-examples

Simple example apps for Rheem including Wordcount, IND detection, and PageRank
Apache License 2.0
3 stars 3 forks source link

Rheem examples

This repository contains several simple example apps for Rheem. All of them are written in Scala and/or Java and are either IPython notebooks (with the jupyter-scala kernel) or can be built with Maven. Please find details about the different apps in the following.

Wordcount

Wordcount counts the different distinct words occurring in a text and is basically the "Hello, World!" for data processing. We provide two implementations of it using both Rheem's Java and Scala API.

Usage. You can run the app via

$ java/scala ... com.github.sekruse.wordcount.java.WordCount/com.github.sekruse.wordcount.scala.WordCount <plugins> <input file> [<words per line>]

SINDY

This app is an implementation of the SINDY algorithm to find inclusion dependencies in relational databases. It employs Rheem for platform independence and cross-platform query processing, respectively. In the current version, the data is expected to reside in a SQLite3 database.

Usage. You can run the app via

$ java/scala com.github.sekruse.sindy.Sindy <plugins> <JDBC URL> <schema file> [<tables ...>]

PageRank

This app consumes an RDF triple file, constructs a graph from it, and finally runs a PageRank on that graph. It uses Rheem to easily bring together two different data processing tasks, namely preprocessing and graph analytics, and also makes use of Rheem's PageRankOperator.

Usage. You can run the app via

$ java/scala com.github.sekruse.pagerank.PageRank <plugins> <input file> <#iterations>