narendraj9 / digsep

Degree of Separation on Twitter
5 stars 0 forks source link

Idea: Given a list of "seed users", get all users within N links #1

Open DonaldTsang opened 4 years ago

DonaldTsang commented 4 years ago

If I were to have a list of people from a single community, would it be possible to find N-distance connections of those users and save them down into a file (follow list of everyone involved)? That would be useful for community detection.

narendraj9 commented 4 years ago

I think a breadth-first search from each user, while maintaining the set of members in the overall community, should work.

DonaldTsang commented 4 years ago

How do you stop it at a certain breadth? (e.g. A-B-C-D or three degrees?)

Also, how would de-duplication and distance updating works (e.g. W is 1 away from X, 2 away from Y and 3 away from Z, and X to Z are all part of the seed list)?

Lastly, how do you export that into JSON or some other data format (dictionary of user-id as key, to list of user-id as value)?

regarding visualization https://github.com/jvallyea/Mapping-Social-Media and https://github.com/timbennett/twitter-chat-networks and https://github.com/mgmacias95/TwitterFriends and https://github.com/SadeghHayeri/Twitter-Friend-Connections regarding storing data someone made a thing for CSV (but only a single layer) https://github.com/ian-nai/Twitter-Friends-Scraper and maybe multiple https://github.com/DocNow/foaf

narendraj9 commented 4 years ago

How do you stop it at a certain breadth? (e.g. A-B-C-D or three degrees?)

Breath-first search can be used to compute shortest distance from a node in a graph. So, as you traverse the graph you can compute the distance from the starting node and avoid traversing the outgoing edges once you have reached a node with a distance of X units from the source node.

Also, how would de-duplication and distance updating works (e.g. W is 1 away from X, 2 away from Y and 3 away from Z, and X to Z are all part of the seed list)?

If I understand you correctly about de-duping the members, I think a "set" data-structure will take care of that. I assumed that the final output of the algorithm would be this set which contains the members of the community.

Lastly, how do you export that into JSON or some other data format (dictionary of user-id as key, to list of user-id as value)?

Not sure what exactly you mean here.

DonaldTsang commented 4 years ago

Not sure what exactly you mean here.

I am using this to scrape a Twitter follow network for Community Detection and Role Similarity/Discovery/Detection. So for every user I need its follow links saved as some kind of file for ease of storage and use in NetworkX or iGraph.