jboynyc / textnets

Text analysis with networks.
https://textnets.readthedocs.io/
GNU General Public License v3.0
284 stars 23 forks source link

An explanation into Centrality and testing with Alternative Centralities #32

Closed BradKML closed 2 years ago

BradKML commented 2 years ago

Observing https://github.com/jboynyc/textnets/blob/trunk/docs/tutorial.rst it uses the Degree-Closeness-Betweenness Trifecta for explaining the term and topic significance.

For example, for NetworkX there is a large suite of centrality algorithms, and that Degree PCAs have been done to determine alternatives to Degree and Closeness centrality.

import pandas as pd
df = pd.DataFrame(G.nodes(), columns =['node'])

def fixer(table, graph, column, function):
    try:
        representation = function(graph)
        table[column] = table['node'].map(representation)
        print(column)
    except: print("not "+column)

fixer(df, G, 'pagerank', nx.pagerank)
fixer(df, G, 'vitality', nx.closeness_vitality)
fixer(df, G, 'clustering', nx.clustering)
fixer(df, G, 'gen_deg', nx.generalized_degree)
fixer(df, G, 'degree', nx.degree_centrality)
fixer(df, G, 'out_deg', nx.out_degree_centrality) 
fixer(df, G, 'in_deg', nx.in_degree_centrality)
fixer(df, G, 'eigenvector', nx.eigenvector_centrality)
fixer(df, G, 'katz', nx.katz_centrality)
fixer(df, G, 'closeness', nx.closeness_centrality)
fixer(df, G, 'inc_close', nx.incremental_closeness_centrality)
fixer(df, G, 'flow_close', nx.current_flow_closeness_centrality)
fixer(df, G, 'info', nx.information_centrality)
fixer(df, G, 'betweenness', nx.betweenness_centrality)
fixer(df, G, 'flow_between', nx.current_flow_betweenness_centrality)
fixer(df, G, 'communicability', nx.communicability_betweenness_centrality)
fixer(df, G, 'load', nx.load_centrality)
fixer(df, G, 'subgraph', nx.subgraph_centrality)
fixer(df, G, 'sub_exp', nx.subgraph_centrality_exp)
fixer(df, G, 'estrada', nx.estrada_index)
fixer(df, G, 'harmonic', nx.harmonic_centrality)
fixer(df, G, 'local_reach', nx.local_reaching_centrality)
fixer(df, G, 'global_reach', nx.global_reaching_centrality)
fixer(df, G, 'percolation', nx.percolation_centrality)
fixer(df, G, '2nd_o', nx.second_order_centrality)
fixer(df, G, 'trophic', nx.trophic_levels)
fixer(df, G, 'voterank', nx.voterank)
jboynyc commented 2 years ago

I'm aware of NetworkX but I didn't know they had so many more measures built in than igraph. Through this issue I also discovered scikit-networks, which looks amazing, so thanks for that.

I will document how nx can be used with textnets, but again, I'm not sure how to interpret (for instance) Katz centrality in the context of textnets. If you have any ideas for why one would use these other measures aside from it being technically feasible, I'd be happy to hear about them.

BradKML commented 2 years ago

@jboynyc for context, there seem to be three principled components to centrality, and for the sake of examples, I will use PageRank, EigenCent, and Communicability as a modern stand-in for Degree, Closeness, and Betweenness Centrality.

PageRank will be high if a node is more likely of being cited or referenced directly. EigenCent will be high if a node is more likely to be the indirect inspiration of multiple derivatives. Communicability will be high if a node is well connected with other nodes that are of high influence.

Some say that creativity is created by three processes: Copying, Transformation, and Combination. It is highly likely that these principled components of networked document references and keyword/topic co-occurrences will imply the strength of each of these process types. Certain topics can be either of high influence, relevance, or nuance.

BradKML commented 2 years ago

Alternatively, there are role similarity algorithms that do this without explicit centrality data.

Or that node embedding is an option https://github.com/benedekrozemberczki/karateclub

jboynyc commented 2 years ago

Thanks! Great to find out about all these great projects. From my understanding role extraction wouldn't make much sense for a textnet. As for node embeddings, I was reading up on those just the other day and I was certainly intrigued by the possibilities.

BradKML commented 2 years ago

@jboynyc It would be great to give some hints on how node embeddings can help, since role extraction and centrality are ultimately subtasks of node embeddings.

jboynyc commented 2 years ago

Appreciate your willingness to think along. This isn't my primary area of research, so I'll need to find some time to dive into the literature, since network embeddings are still quite new to me.

BradKML commented 2 years ago

The Two Major Forms of Node Embedding is "Node Proximity" (which creates visualization-like distance vector) and "Structural Identity" (which clusters similar nodes based on Structural Roles) https://github.com/jwu4sml/Graph-Embedding-Techniques#1-pure-network-embedding and Fig 3 of http://pengcui.thumedialab.com/papers/NetworkEmbeddingSurvey.pdf Some examples of those Structural Embedding measures:

BradKML commented 2 years ago

As a side note, other libraries with better Centrality speed-ups allow for conversion work from NetworkX to Graph-Tool. https://github.com/BlueBrain/BlueGraph/issues/93#issuecomment-953650642

jboynyc commented 2 years ago

Thanks. I've added something to the documentation about using NetworkX for additional centrality measures, but I won't, for the time being at least, recommend any other dependencies, simply because I don't have the bandwidth to try them all out and keep example code updated in case of upstream API changes.

I would welcome a pull request to expand the documentation, if that's something you'd be interested in contributing.