Closed cperezmig closed 8 years ago
Hi! First, thanks for your message. I've just committed to correct some typos. Let me know if it works now.
Otherwise, can you upload your dataset as .txt file in a new comment, so I can check specifically what's going wrong?
After a few tests, I reproduced the same error than yours. Give me some time to investigate the problem.
Hi cperezmig,
I found the error. It was due to adjacency matrix creation function. I used nodes id as column indices. Most of the time, those ids are not related to the number of nodes in the graph.
In your example, while there are 592 nodes, last node id is 1841, so when you try to allocate a value to the 1841th column, it gives you "you may not write an element to index 1841 because the declared size of your vector is 592". My test graph were biased because there ids matched perfectly with the number of nodes.
I have created a lookup table, so each node id is associated with a unique correct column index.
Next commit will correct that issue, but for the moment, results of my algorithm on your dataset does not match the one of classical MCL implementation of Stijn van Dongen.
Let me know if it's ok to close this issue. It will probably be great to open a new one for the last problem.
Hi Joandre,
Sorry for the late of my answer. I confirm you that the problem seems to be fixed but that the results are different from the classical MCL implementation. For my part, you can close the issue. Let me know if you advance with the latter problem.
Thanks
Carlos Pérez Miguel
2015-12-30 18:28 GMT+01:00 joandre notifications@github.com:
Hi cperezmig,
I found the error. It was due to adjacency matrix creation function. I used nodes id as column indices. Most of the time, those ids are not related to the number of nodes in the graph.
In your example, while there are 592 nodes, last node id is 1841, so when you try to allocate a value to the 1841th column, it gives you "you may not write an element to index 1841 because the declared size of your vector is 592". My test graph were biased because there ids matched perfectly with the number of nodes.
I have created a lookup table, so each node id is associated with a unique correct column index.
Next commit will correct that issue, but for the moment, results of my algorithm on your dataset does not match the one of classical MCL implementation of Stijn van Dongen.
Let me know if it's ok to close this issue. It will probably be great to open a new one for the last problem.
— Reply to this email directly or view it on GitHub https://github.com/joandre/MCL_spark/issues/2#issuecomment-168040140.
Hi Carlos,
The problem is solved. It remains differences for your dataset, but from integration tests I have made, my implementation is closed to the original one. Main differences are in sparseness management of stochastic matrices. Stijn van Dongen uses much complex approaches than I do but I keep a simple method for now.
Joan
Hi Joan,
Perfect, for me the problem is solved. Thanks
Carlos
I found this error while using MCL_spark over a directed graph these features: scala> nodes.count res: Long = 6021 scala> nodes.take(2) res: Array[(Long, String)] = Array((0, A), (1,B))
scala> relationships.count res: Long = 37024 scala> relationships.take(2) res: Array[org.apache.spark.graphx.Edge[Double]] = Array(Edge(0,3582,1.0), Edge(0,713,1.0))
scala> val graph = Graph(nodes, relationships).cache() graph: org.apache.spark.graphx.Graph[String,Double] = org.apache.spark.graphx.impl.GraphImpl@
scala> val clusters: RDD[Assignment] = new MCL().setExpansionRate(2).run(graph).assignments Error: java.lang.IllegalArgumentException: requirement failed: You may not write an element to index 6351 because the declared size of your vector is 6021
I though that maybe the problem was in my data, but using the original implementation of Stijn van Dongen I have obtained 509 clusters. Any idea of the origin the error?