sbenthall / poll.emic

A tool for visualizing a Twitter user's relationships
12 stars 5 forks source link

In mentionball, separate out data collection from snowball technique and final graph computation #11

Open sbenthall opened 10 years ago

sbenthall commented 10 years ago

When building out the mentionball, currently data collection, snowball strategy, and final graph output are integrated.

Separating these into different stages will make it easier to save richer data about the users and their activity, and then derive multiple alternate graph representations for visualization and analysis.

bkfunk commented 10 years ago

So, I've split mentionball.data_to_network() into two functions: the first (data_to_network) just makes it into a graph and returns the graph, while the second (lookup_metadata(graph)), which I call in main, goes through the edges (as the old version did) and uses lookup_many to get all their metadata, saving some of it (follower count, and now some geo data) in the graph itself.

Do we maybe want to switch that around, so we generate the graph last, after we have some internal data structure containing only the data we want to store?

So step one, get list of users (snowball strategy), step two, collect and clean data (and geocode in here too) for those users, storing it in a dictionary or whatever, step three, change the dict into a graph and output it?

sbenthall commented 10 years ago

It would be easier to talk about this if we were looking at the same code. Can you share a link to the changes you've been making?

On Thu, Jan 30, 2014 at 3:54 PM, bkfunk notifications@github.com wrote:

So, I've split mentionball.data_to_network() into two functions: the first (data_to_network) just makes it into a graph and returns the graph, while the second (lookup_metadata(graph)), which I call in main, goes through the edges (as the old version did) and uses lookup_many to get all their metadata, saving some of it (follower count, and now some geo data) in the graph itself.

Do we maybe want to switch that around, so we generate the graph last, after we have some internal data structure containing only the data we want to store?

So step one, get list of users (snowball strategy), step two, collect and clean data (and geocode in here too) for those users, storing it in a dictionary or whatever, step three, change the dict into a graph and output it?

Reply to this email directly or view it on GitHubhttps://github.com/sbenthall/poll.emic/issues/11#issuecomment-33748672 .

bkfunk commented 10 years ago

https://github.com/bkfunk/poll.emic/blob/geo/bin/mentionball.py

Line 44ish is where stuff starts. It's a big mess right now, though!