igraph / rigraph

igraph R package
https://r.igraph.org
552 stars 200 forks source link

subisomorphism fails if graph has an 'x' attribute #154

Closed ntamas closed 2 years ago

ntamas commented 8 years ago

From @barryrowlingson on May 25, 2016 17:48

Make a graph and look for squares:

g1 = graph.full(4)
subs = graph.get.subisomorphisms.vf2(g1,graph.ring(4))

Add some vertex attributes, all still works:

V(g1)$NNN=1:4
subs = graph.get.subisomorphisms.vf2(g1,graph.ring(4))
V(g1)$X=1:4
subs = graph.get.subisomorphisms.vf2(g1,graph.ring(4))

Unless its called x:

V(g1)$x=1:4
subs = graph.get.subisomorphisms.vf2(g1,graph.ring(4))
#> Error in simple_vs_index(x, ii, na_ok) : Unknown vertex selected

I'm getting a bit of a deja-vu here, have I reported something like this before?

igraph v 1.0.1, R 3.2.2

Copied from original issue: igraph/igraph#945

ntamas commented 8 years ago

Confirmed; also happens on my machine. However, I'm not familiar with the R interface so we'll have to wait for @gaborcsardi to chime in.

gaborcsardi commented 8 years ago

It is indeed a bug.

djerbirachid commented 5 years ago

any suggestion please, i have the same problem.

szhorvat commented 2 years ago

Still present in the 1.3 dev version.

ntamas commented 2 years ago

I'm not familiar with R enough to figure out what's going on, but apparently replacing this line:

lapply(res, function(x) V(graph1)[x + 1])

with this line

lapply(res, function(y) V(graph1)[y + 1])

in graph.get.subisomorphisms.vf2 in R/topology.R "fixes" the problem -- but of course you then have a problem with vertex attributes named y.

ntamas commented 2 years ago

So here's a simpler way to reproduce this:

> g <- graph.full(4)
> V(g)$x <- 1:4
> x <- 3
> V(g)[x + 1]
Error in simple_vs_index(x, ii, na_ok) : Unknown vertex selected

And apparently the problem lies in a feature of V(g) indexing that I was not aware of before:

> g <- graph.full(4)
> V(g)$x <- 1:4
> V(g)$y <- 4:1
> V(g)[x >= 2 & y >= 2]

In this case, the expression x >= 2 & y >= 2 gets evaluated in a context where x and y refer to the corresponding vertex attributes of the graph. The magic happens in this part of R/iterators.R:

    attrs <- vertex_attr(graph)
    xvec <- as.vector(x)
    for (i in seq_along(attrs)) attrs[[i]] <- attrs[[i]][xvec]

    res <- lazy_eval(
      args,
      data =  c(attrs, .nei = .nei, nei = nei, .innei = .innei,
        innei = innei, .outnei = .outnei,  outnei = outnei, adj = adj,
        .inc = .inc, inc = inc, .from = .from, from = from, .to = .to,
        to = to)
    )
    res <- lapply(res, function(ii) {
      if (is.null(ii)) return(NULL)
      ii <- simple_vs_index(x, ii, na_ok)
      attr(ii, "env") <- attr(x, "env")
      attr(ii, "graph") <- attr(x, "graph")
      class(ii) <- class(x)
      ii
    })

where we basically retrieve all the vertex attributes of the graph, subset them to the current vertex sequence, and then create a lazy object with an associated data frame that is pre-filled with the vertex attributes and some other stuff (nei, from etc). This is nice but it interferes with the evaluation of the expression in the brackets in V(g)[...] because we first look things up in this data frame and only look at the environment of the lazy expression when the lookup fails. Actually, this is even documented behaviour (source):

When indexing vertex sequences, vertex attributes can be referred to simply by using their names. E.g. if a graph has a name vertex attribute, then V(g)[name == "foo"] is equivalent to V(g)[V(g)$name == "foo"].

A workaround for this is to use a "private" variable name like .x in the code, e.g.: lapply(res, function(.x) V(graph1)[.x + 1], with the implicit assumption that users won't create vertex attribute names starting with a dot. However, this will still be a common pitfall for people who routinely write code like function(x) V(g)[do_something_with_x].

ntamas commented 2 years ago

The workaround is now used and I think it's good enough for our purposes.

szhorvat commented 2 years ago

Of course this still fails if there is an attribute called .x.

ntamas commented 2 years ago

Yes, but by convention, names starting with a dot are considered private in R. E.g., ls() in R does not list variables starting with a . unless all.names=TRUE is passed to it. That's why I said that it's "good enough" for our purposes. If you want to break it, surely you can break it :)

szhorvat commented 2 years ago

This is about attribute names, though, not R variables. Attributes may come from sources outside of R, such as when importing files.

For example: When I was thinking about how to handle compound attributes in GML files, one of the ideas was to use the same notation as in the GML specs. This naturally leads to encoding top-level attributes names to strings that start with dots. I no longer think that this is a good idea, but it shows the problem: at the time I was working with the C core, not thinking about R, and not aware of this convention in R.

ntamas commented 2 years ago

Still, we can't change the behaviour of the [ operator now without breaking lots of existing code for users, and most of that code is probably in private R notebooks (and not in library code -- at least the revdep checks would discover those). I believe that whoever knows that attributes can be referred to in the [ operator is happy to use this feature (I find it quite convenient, albeit a bit confusing at first), so I don't think it's a niche thing that we could just remove.

What we can do, though, is to go through the library code of igraph and make sure that we use the [ operator in a way that existing variables do not conflict with attributes at the point of usage.

ntamas commented 2 years ago

Commit 4ccd00f7 adds a resolve_attrs=... option to vertex / edge indexing so you can turn attribute lookup off in problematic cases.