iit-cs579 / main

CS579: Online Social Network Analysis at the Illinois Institute of Technology
147 stars 204 forks source link

Clustering Problem #377

Closed Swapnil2095 closed 7 years ago

Swapnil2095 commented 7 years ago

How to use partition_girvan_newman algorithm for directed graph. As twitter user and friends link is directed edge.

Can we make this link undirected? or there is any way to use partition_girvan_newman for directed graph as well?

Swapnil2095 commented 7 years ago

Hi,

As I took 10 Indian ministries tweeter accounts and their friends (>100) .

My graph is too sparse.

Using girvan_newman algorithm, I found out 2 clusters. I got second cluster with just 1 node.

See the attachments.

Let me know whether it is fine? or should I also take friend's of friends too.

clusters network

cicirao commented 7 years ago

I think in the class professor said tried to avoid communities that only have one node.. But, I think it could happen. Getting friends' friends is an option but may take too long. You may also try to generate more than just two clusters? I got 8 clusters but there's always one community only has one node.😑 screen shot 2016-11-21 at 15 10 17

Swapnil2095 commented 7 years ago

@cicirao - Can you please tell me your logic ? How you got such big network.

I am taking 8 Indian ministries twitter account and their 2000-5000 followers.

I just used followers of the twitter accounts.

Is it ok, right?

It is not compulsion to use friends and followers both. We can use any one of them or both. Right?

I found lots of common followers -

[('HRDMinistry', 'MoHFW_INDIA', 3552), ('HRDMinistry', 'RailMinIndia', 1759), ('MoHFW_INDIA', 'RailMinIndia', 1607), ('CimGOI', 'TexMinIndia', 942), ('HMOIndia', 'HRDMinistry', 759), ('TexMinIndia', 'tourismgoi', 710), ('HMOIndia', 'RailMinIndia', 686), ('MoHFW_INDIA', 'PIB_India', 610), ('HMOIndia', 'MoHFW_INDIA', 609), ('PMOIndia', 'RailMinIndia', 604), ('HMOIndia', 'PIB_India', 586), ('PIB_India', 'RailMinIndia', 585), ('HRDMinistry', 'PIB_India', 579), ('CimGOI', 'HMOIndia', 578), ('CimGOI', 'HRDMinistry', 444), ('CimGOI', 'MoHFW_INDIA', 403), ('CimGOI', 'PIB_India', 313), ('HRDMinistry', 'PMOIndia', 286), ('MoHFW_INDIA', 'TexMinIndia', 265), ('MoHFW_INDIA', 'PMOIndia', 261), ('CimGOI', 'tourismgoi', 253), ('HMOIndia', 'TexMinIndia', 229), ('HRDMinistry', 'TexMinIndia', 213), ('PIB_India', 'TexMinIndia', 171), ('CimGOI', 'RailMinIndia', 143), ('HMOIndia', 'PMOIndia', 110), ('PIB_India', 'PMOIndia', 97), ('MoHFW_INDIA', 'tourismgoi', 68), ('HMOIndia', 'tourismgoi', 63), ('HRDMinistry', 'tourismgoi', 58), ('PIB_India', 'tourismgoi', 55), ('RailMinIndia', 'TexMinIndia', 51), ('RailMinIndia', 'tourismgoi', 19), ('CimGOI', 'PMOIndia', 18), ('PMOIndia', 'TexMinIndia', 8), ('PMOIndia', 'tourismgoi', 5)]

using girvan_newman algorithm, I am always getting just one node in 2nd cluster (I just tried for 2 clusters).

One more thing, As Graph is directed, I made it undirected by myself. Is it ok?

Swapnil2095 commented 7 years ago

TA or @cicirao, Please help me with this.

cicirao commented 7 years ago

I think to get a more connected graph, it's better to go 2 hops away. E.g. Start from node A, get node A's friends, and then get friends of A's friends. In this way, you can make sure that the original graph could be one connected graph. You can use followers instead, but also, it's better to go 2 hops away(get followers'' followers). I started from node A, get A mentioned users, and then get users these metioned users metioned.

If you keep get one same node for one cluster, that may be the case. That node just does not belong to other clusters. But, before you do cluster, did you try to use min_degree to remove some nodes,like what we did in our assignment? Another thing is, I said maybe try to generate more than 2 clusters. Because in your case, cluster one which is only one node less than the original graph, doesn't seem to show any interesting communities.

And I think use undirected graph is fine.

I hope this helps.

Swapnil2095 commented 7 years ago

Thanks cicirao. Will work on it and get back to you.

Swapnil2095 commented 7 years ago

@cicirao or TA's - please tell me whether I am right or not?

Here is my data collection-

Friends per Ministry: PMOIndia 100 RailMinIndia 100 TexMinIndia 100 Most common friends: [(802496556318408705, 2), (802426892951363584, 2), (802495639594549252, 2), (802496295566876672, 2), (802496941338730496, 2)] Friend Overlap: [('PMOIndia', 'RailMinIndia', 12), ('RailMinIndia', 'TexMinIndia', 1), ('PMOIndia', 'TexMinIndia', 0)]

image

After clustering, image

When I am calculating edge_betweenness_centrality every time after highest betweenness removed. like- while len(components)<=3: #just want 3 components G.remove_edge(*(find_betweenness(G)))

Then only I am getting expected output. Please tell me whether I am right or wrong with my approach?

Also tell me whether there is still need to take follower's of followers? Or no need ? Thanks!!

ClaytonTurnerOld commented 7 years ago

If you're dynamically collecting data and successfully clustering your communities you should be fine. More data wouldn't hurt your approach.