local.scan revisited - Githubissues

youngser commented 10 years ago

Gabor, We have one small request for "local.scan" function.

Often times we have a situation to calculate "them" statistics several times with a fixed "current" graph, e.g.,

local.scan(gt, g{t-1}, k,...) local.scan(gt, g{t-2}, k,...) local.scan(gt, g{t-3}, k,...) ...

As you can see, in this case it's a waste time to calculate "neighborhood" of g_t for each call of "local.scan", especially when n==vcount & k are big!!

Instead, it'll be computationally more efficient if we calculate the neighborhood of g_t once and reuse it!

Here is a small example:

require(igraph) n <- 100 p <- 0.1 k <- 3

g <- erdos.renyi.game(n,p) gp1 <- erdos.renyi.game(n,p)

ls1 <- local.scan(g,gp1,k=k) ls2 <- sapply(1:n, function(x) {ecount(induced.subgraph(gp1,unlist(neighborhood(g,k,x,"out"))))}) Ngt <- neighborhood(g,k) ls3 <- sapply(1:n, function(x) {ecount(induced.subgraph(gp1,unlist(Ngt[[x]])))}) stopifnot(ls1==ls2, ls1==ls3)

reuse Ngt

gp2 <- erdos.renyi.game(n,p) ls4 <- local.scan(g,gp2,k=k) ls5 <- sapply(1:n, function(x) {ecount(induced.subgraph(gp2,unlist(Ngt[[x]])))}) stopifnot(ls4==ls5)

In summary, is it possible if you can add one more parameter "Ngt" in the "local.scan"? So, the new "local.scan" will be something like this:

local.scan(grapht, graph{t-1}=NULL, Ngt=NULL, k=1,...)

and,

if (graph_{t-1}==NULL) if (Ngt==NULL), calc local.scan(graph_t,k=1), else, use Ngt and calc local.scan(graph_t,Ngt,k=1) without recalculating the neighborhood(graph_t),
else if (graph_{t-1}!=NULL & Ngt==NULL), then it needs to calc neighborhood(grapht) first, then apply it to graph{t-1},
else, then ignore grapht, and apply Ngt directly to graph{t-1}.

Does this make sense? I hope it's not too complicated to revise the function.

Thanks,

gaborcsardi commented 10 years ago

Btw. if the function to apply is the number of edges (or sum of edge weights), local.scan never explicitly calculates the neighborhoods. Pre-calculated neighborhoods might still speed things up, but it is not clear by how much.

youngser commented 10 years ago

Gabor said: To be sure, how much of the running time is spent by calculating the neighborhoods?

system.time(neighborhood(erdos.renyi.game(10000,0.2),1)) user system elapsed 3.656 0.526 4.182 system.time(neighborhood(erdos.renyi.game(10000,0.2),2)) user system elapsed 313.758 1.342 315.111

So it's not negligible as n and k increase, I believe ?? Is this what you meant?

gaborcsardi commented 10 years ago

Yes, but this is not the code in local.scan. local.scan does not calculate the neighborhood explicitly. So I guess it is not easy to measure it, after all.

youngser commented 10 years ago

Just curious, how is done then? Just briefly?

gaborcsardi commented 10 years ago

It does crawl the graphs, obviously, but does not store the neighborhoods explicitly, as they are only needed temporarily.

It's here. https://github.com/gaborcsardi/igraph/blob/develop/src/scan.c

youngser commented 10 years ago

I see. Thanks! It'll be interesting to see how much speed up we can get with this pre-calculated neighborhood info...

gaborcsardi commented 10 years ago

OK, I did this. It creates the subgraphs based on the supplied neighborhoods, and it is way slower than just crawling the graph again.

It could be rewritten in C, not to create a subgraphs, but I am skeptical about the results, because the step we want to speed up is already very fast, and takes only a small part of the whole process.

youngser commented 10 years ago

On Jan 9, 2014, at 1:02 PM, Gabor Csardi notifications@github.com wrote:

OK, I did this. It creates the subgraphs based on the supplied neighborhoods, and it is way slower than just crawling the graph again.

Interesting! Just curious, what n and k did you use for your test?

Heng, can we benchmark with enough large n and k ??

Youngser

It could be rewritten in C, not to create a subgraphs, but I am skeptical about the results, because the step we want to speed up is already very fast, and takes only a small part of the whole process.

— Reply to this email directly or view it on GitHub.

gaborcsardi commented 10 years ago

Just curious, what n and k did you use for your test?

https://github.com/igraph/xdata-igraph/blob/develop/interfaces/R/igraph/inst/tests/test_scan.R#L76

youngser commented 10 years ago

Heng, Gabor tested it with ER(n=10^3,p=0.1) and k={1,2}. Will it be big enough?

On Jan 9, 2014, at 2:53 PM, Gabor Csardi notifications@github.com wrote:

Just curious, what n and k did you use for your test?

https://github.com/igraph/xdata-igraph/blob/develop/interfaces/R/igraph/inst/tests/test_scan.R#L76

— Reply to this email directly or view it on GitHub.

whalesandro commented 10 years ago

Hello Gabor,

I don't quite understand the specific crawling method mentioned here "It does crawl the graphs, obviously, but does not store the neighborhoods explicitly, as they are only needed temporarily." Could you let me know if there is any literature reference that i can read and learn for this crawling method to compute local.scan? It might be very helpful for my further algorithm investigation. Only reading scan.c is a little bit struggling...

Also, what are the roles of local.scan1.ecount.approx and local.scan1.ecount.approx.eigen in latest igraph?

thanks, Gabor.

Best, Heng

gaborcsardi commented 10 years ago

I don't quite understand the specific crawling method mentioned here "It does crawl the graphs, obviously, but does not store the neighborhoods explicitly, as they are only needed temporarily." Could you let me know if there is any literature reference that i can read and learn for this crawling method to compute local.scan?

It is a breadth first search (see Wikipedia). It is essentially summarized in this comment: https://github.com/igraph/xdata-igraph/blob/develop/src/scan.c#L632-L633

/* We mark the nodes in US in a BFS. Then we check the outgoing edges
    of all marked nodes in THEM. */

This is really all.

It might be very helpful for my further algorithm investigation. Only reading scan.c is a little bit struggling...

Also, what are the roles of local.scan1.ecount.approx and local.scan1.ecount.approx.eigen in latest igraph?

Calculate an approximate solution which was promoted by Josh, I forgot the reference, but the idea is that you can calculate the number of adjacent triangles (equivalent to local scan-1) using the eigenvectors of the adjacency matrix. Now if you only calculate the first couple of eigenvectors which is fast, you get an approximate solution which is good if the non-used eigenvectors have corresponding eigenvalues that are close to zero. The first function does everything, the second starts with the eigenvalues and vectors. See examples from here: https://github.com/igraph/xdata-igraph/blob/develop/interfaces/R/igraph/inst/tests/test_scan.R#L141

This was supposed to be fast by some people, but in practice it is way slower than the exact calculation.

jovo commented 10 years ago

yup, the algorithm we used before that was approximate came from here: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=4781156'%20escapeXml='false'/%3E

it was a significant speed up over more naive matrix operations, but in the 5 yrs since the publication of this work, exact methods took over (maybe so even before this paper, i'm not sure)...

On Thu, Jan 23, 2014 at 8:47 PM, Gabor Csardi notifications@github.comwrote:

I don't quite understand the specific crawling method mentioned here "It does crawl the graphs, obviously, but does not store the neighborhoods explicitly, as they are only needed temporarily." Could you let me know if there is any literature reference that i can read and learn for this crawling method to compute local.scan?

It is a breadth first search (see Wikipedia). It is essentially summarized in this comment: https://github.com/igraph/xdata-igraph/blob/develop/src/scan.c#L632-L633

/* We mark the nodes in US in a BFS. Then we check the outgoing edges of all marked nodes in THEM. */

This is really all.

It might be very helpful for my further algorithm investigation. Only reading scan.c is a little bit struggling...

Also, what are the roles of local.scan1.ecount.approx and local.scan1.ecount.approx.eigen in latest igraph?

Calculate an approximate solution which was promoted by Josh, I forgot the reference, but the idea is that you can calculate the number of adjacent triangles (equivalent to local scan-1) using the eigenvectors of the adjacency matrix. Now if you only calculate the first couple of eigenvectors which is fast, you get an approximate solution which is good if the non-used eigenvectors have corresponding eigenvalues that are close to zero. The first function does everything, the second starts with the eigenvalues and vectors. See examples from here:

https://github.com/igraph/xdata-igraph/blob/develop/interfaces/R/igraph/inst/tests/test_scan.R#L141

This was supposed to be fast by some people, but in practice it is way slower than the exact calculation.

— Reply to this email directly or view it on GitHubhttps://github.com/igraph/xdata-igraph/issues/12#issuecomment-33190314 .

perhaps consider allowing the quest for eudaimonia to guide you openconnecto.me, jovo.me

whalesandro commented 10 years ago

Great, thanks a lot to Gabor and Joshua!

gaborcsardi commented 10 years ago

With the C implementation, pre-calculated neighborhoods are sometimes faster.

igraph / xdata-igraph

local.scan revisited #12

reuse Ngt