lintool / warcbase

Warcbase is an open-source platform for managing analyzing web archives
http://warcbase.org/
161 stars 47 forks source link

Represent link structure as graph using GraphX #201

Closed jrwiebe closed 8 years ago

jrwiebe commented 8 years ago

Once we obtain a graph representation of our site link structure within Spark/Warcbase, we will be able to further simplify operations that currently depend on other tools (e.g. Gephi, for PageRank).

https://spark.apache.org/docs/latest/graphx-programming-guide.html#pagerank

ianmilligan1 commented 8 years ago

This is great. Baking PageRank as per these docs into warcbase would be perfect, as a way to extract an ordered list of relevant resources.

ianmilligan1 commented 8 years ago

Probably worth merging with #183. The fork of @aliceranzhou's link-structure repo is at https://github.com/shamrt/link-structure. Are we getting close to being able to incorporate into warcbase, either this repo or docs?

ianmilligan1 commented 8 years ago

Just pinging again. How close is this branch being ready to incorporate into main? (would be nice to include in the write-up of warcbase we're doing!)

jrwiebe commented 8 years ago

I will take care of this shortly. On Mar 17, 2016 1:19 PM, "Ian Milligan" notifications@github.com wrote:

Just pinging again. How close is this branch being ready to incorporate into main? (would be nice to include in the write-up of warcbase we're doing!)

— You are receiving this because you authored the thread. Reply to this email directly or view it on GitHub https://github.com/lintool/warcbase/issues/201#issuecomment-197983455