lintool / warcbase

Warcbase is an open-source platform for managing analyzing web archives
161 stars 47 forks source link

Example counting prevalence of tweeted images #214

Closed lintool closed 8 years ago

lintool commented 8 years ago

This example works with Warcbase (on rho):

import org.warcbase.spark.matchbox._
import org.warcbase.spark.matchbox.TweetUtils._
import org.warcbase.spark.rdd.RecordRDD._
import org.json4s._
import org.json4s.jackson.JsonMethods._

val tweets = RecordLoader.loadTweets("/mnt/vol1/data_sets/elxn42/ruest-white/elxn42-tweets-combined-deduplicated.json", sc)

val counts = tweets.flatMap(tweet => tweet \\ "media_url_https" \ classOf[JString] )


counts: Array[(org.json4s.JString#Values, Int)] = Array((,11558), (,8876), (,7896), (,6258), (,6122), (,5776), (,5430), (
jrwiebe commented 8 years ago

Added to docs. Closing.