neo4j-contrib / neo4j-graph-algorithms

Efficient Graph Algorithms for Neo4j
https://github.com/neo4j/graph-data-science/
GNU General Public License v3.0
770 stars 194 forks source link

PageRank query results are inconsistent(1.APOC 2.Extended algorithmic package) #716

Open crazyyanchao opened 5 years ago

crazyyanchao commented 5 years ago
  1. I have installed two extendsion packages apoc-3.4.0.1-all.jar graph-algorithms-algo-3.4.7.0.jar

  2. Performing pagerank on the same dataset varies hugely.(Database version:neo4j-community-3.4.7) 2.1、apoc-3.4.0.1-all.jar

    MATCH (n:专题) WITH collect(n) as nodes CALL apoc.algo.pageRank(nodes) YIELD node,score RETURN node.name,score ORDER BY score DESC
node.name score
"十一长假4" 11013.60778
"LDR测试2" 10587.83657
"自用_181" 7248.36549
"东沟岭农贸市场发现一女尸" 6147.92054
"981钻井平台_618" 4663.55536
"公安满意度4" 4086.03851
"LDR模糊4" 3917.40468
"APEC0" 3845.58618
"取消军训我3" 3799.40371
"政法系统满意程度1" 3787.1927

2.2、graph-algorithms-algo-3.4.7.0.jar存在的过程(ALL score is 0.15000000000000002 )

CALL algo.pageRank.stream(‘专题’,NULL,{iterations:20, dampingFactor:0.85}) YIELD node, score RETURN node.name, score ORDER BY score DESC
node.name score
"武警广西总队停止有偿工作" 0.15000000000000002
"武警广西总队停止有偿工作" 0.15000000000000002
"981钻井平台" 0.15000000000000002
"土耳其反华" 0.15000000000000002
"抗战纪念日" 0.15000000000000002
"境外" 0.15000000000000002
"天津爆炸案" 0.15000000000000002
"泰国爆炸" 0.15000000000000002
"南海问题" 0.15000000000000002
"泰国遣返维吾尔族人" 0.15000000000000002
"缅甸特赦非法伐木工" 0.15000000000000002
"缅甸特赦非法伐木工(新)" 0.15000000000000002

The results were quite different ! Please tell me WHY? Thanks!!!

tomasonjo commented 5 years ago

Pagerank value of 0.15000000000000002 is the default value for nodes with no incoming relationships... seems like that no relationships get projected in the graph, which is weird given that you set NULL for relationship type, which should load all.

mneedham commented 5 years ago

Hi @crazyyanchao,

Would you be able to share a small sample dataset that we can recreate this problem with? As @tomasonjo says it's weird why all the nodes have the initial PageRank value.

crazyyanchao commented 5 years ago

@mneedham @tomasonjo If I run: CALL algo.pageRank.stream(NULL,NULL,{iterations:20, dampingFactor:0.85}) YIELD node, score RETURN node.name, score ORDER BY score DESC The label ’专题‘ can figure out a value that looks reasonable. Dataset maybe can not share,sorry! Thanks for you reply!

crazyyanchao commented 5 years ago

@mneedham @tomasonjo @jexp @akollegger I execute two cypher on the same linkedin dataset,but the result vary enormously!

1.The first way

CALL algo.pageRank('LinkedinID', NULL,  {iterations:20, dampingFactor:0.85, write: true,writeProperty:'pagerank'}) YIELD nodes, iterations, loadMillis, computeMillis, writeMillis, dampingFactor, write, writeProperty
MATCH (n:LinkedinID) RETURN n.name,n.pagerank ORDER BY n.pagerank DESC LIMIT 10
n.name n.pagerank
"Dr. Imani Ma'at_29489954" 238797044.98089278
"Kristina Tanasichuk_21342877" 205712106.4265581
"Andy Jabbour_408109800" 175523863.48403177
"Kim Proctor_2794998" 170649994.17900914
"Michael Jacobs_3967109" 142688564.25065896
"Adele Canetti11160947" 105116298.79254237
"Marcia Stepanek_14481523n" 90105381.10887711
"Christy Riccardi_11084249" 78076928.37071984
"Gregg H._3628386" 78046192.97161181
"Hollis Thomases_245341" 75175480.38489856
"Jeff Molter_1411602" 73882728.68542062
"Terezie Mosby_119305546" 73044631.96094015
"Troy Stiner_91210468" 71168889.09655812
"John Robitscher, MPH_8334935" 70542084.97194709

2.The second way

CALL apoc.algo.pageRankWithCypher({iterations:20, write:true})
MATCH (n:LinkedinID) RETURN n.name,n.pagerank ORDER BY n.pagerank DESC LIMIT 10
n.name n.pagerank
"Bill Gates_0" 118.07033
"Richard Branson_0" 101.64432
"Pete Brownell_18332101" 77.71179
"Chuck Brooks_4888851" 74.96686
"Dr. Nicholas R. Scheidt, PsyD, AADP_26394892" 72.11293
"Mark Cuban_0" 67.6066
"Frank T. Mitchell_14176906" 67.3042
"Arianna Huffington_0" 66.41209
"Jack Welch_0" 63.05521
"Tarek Sobh_1564329" 62.50482

Finally

I think the second way is more reasonable! But why didi that happen in the first way? I don't understand! Can you explain that? Thanks :)

tomasonjo commented 5 years ago

Can you share this linkedin dataset?