statnet / network

Classes for Relational Data
Other
15 stars 8 forks source link

unexpected behavior setting edge attributes #4

Closed mkoohafkan closed 5 years ago

mkoohafkan commented 6 years ago

I am seeing unexpected behavior when setting edge attributes. The order of the attribute values assigned to edges does not match the order supplied. See the example below:

# make a random directed graph
set.seed(5)
m <- matrix(rbinom(25,1,.4),15,15)
diag(m) <- 0
g <- network(m, directed = TRUE)

# set an edge attribute
g.att = sample(LETTERS, length(g %e% "na"), replace = TRUE)
g %e% "type" = g.att

# check that order of attribute values is same
all(g.att == g %e% "type")
## FALSE

# more checking
head(cbind(g.att, g %e% "type"))
##      g.att    
## [1,] "M"   "L"
## [2,] "L"   "Z"
## [3,] "Z"   "L"
## [4,] "D"   "H"
## [5,] "Y"   "A"
## [6,] "L"   "V"

This occurs if I use %e% or set.edge.attribute. This behavior also makes it impossible to do substitutions on edge attribute values, e.g. using ifelse or match statements.

mkoohafkan commented 6 years ago

After poking around the repository a bit... It seems that coercing the attribute vector to a list using as.list before assigning via set.edge.attribute (as done in your tests) solves the issue.

set.edge.attribute(g, "type", as.list(g.att))
all(g.att == g %e% "type")
## TRUE

However, this does not work with %e%:

g %e% "type" = as.list(g.att)
all(g.att == g %e% "type")
## FALSE
mkoohafkan commented 6 years ago

Wondering if this issue is somehow related to the ordering of edges in an edgelist. Consider the difference in edge order between:

head(as.edgelist(g, attrname = 'type'))
head(as.matrix.network.edgelist(g, attrname = 'type'))

as.matrix.network.edgelist returns edges in the same order as get.edge.attribute, but as.edgelist.network does not. Which is pretty weird, considering that as.edgelist.network is just a wrapper for as.matrix.network.edgelist anyway.

mkoohafkan commented 6 years ago

Any response from the developers? This seems like a significant bug, unless there is some key detail I'm missing.

krivit commented 6 years ago

Indeed, setting edge attributes can be finicky. @CarterButts , can you take a look at this?

skyebend commented 6 years ago

If I use set.edge.attribute() instead of %e% for assignment it seems to work fine:

> set.edge.attribute(g,'type',g.att)
> all(g.att == g %e% "type")
[1] TRUE

I think the reason is that operator shortcut %e%<- is an alias for for set.edge.value not set.edge.attribute. set.edge.value expects a matrix input and if it doesn't get it, it silently reshapes the input vector into a matrix, which means the ordering is not the same as would be expected for the edge ids. It is definitely confusing, and an interesting design choice, but seems to be behaving as documented.

CarterButts commented 6 years ago

I believe that @skyebend is correct: if you look at the example in the man page for %e%, there is a comment that the input to the operator should be an adjacency matrix. Indeed, that's one of the reasons for the operator - it provides a way to set edge values that is in line with what a naive user accustomed to adjacency matrices would expect. (That is probably less true now than it was when this stuff was first created.)

@mkoohafkan can you verify that this resolves your issue? It doesn't sound like anything is amiss, but it does look like the documentation needs to be clearer. Seems like something to put into the next release....

mkoohafkan commented 6 years ago

@CarterButts OK, I can understand %e% reshaping vectors to matrices for input if that's what it is designed for, but why does it also do it to output?

set.edge.attribute(g,'type',g.att)
g %e% 'type'
## [1] "M" "L" "Z" "D" "Y" "L" "B" "H" "A" "A" "M" "P" "P" "K" "K" "V" "G" "V" "N"
## [20] "Y" "O" "T" "B" "U" "Q" "J" "O" "X" "Z" "Y" "J" "G" "G" "F" "D" "Q" "E" "W"
## [39] "Z" "Z" "X" "X" "E" "Y" "V" "V" "S" "E" "V" "A" "X" "B" "S" "L" "I" "Y" "V"
## [58] "F" "C" "M" "B" "Z" "U" "I" "P" "S" "I" "E" "K" "I" "V" "G" "Q" "M"

Doing this is begging for users to get confused since %e% is not being self-consistent. For example:

g2 = g
g2 %e% 'type' <- g %e% 'type'
identical(g %e% 'type', g2 %e% 'type')
## FALSE

Having %e% return a matrix would be an obvious clue for users regarding how %e%<- should work, i.e. these two statements should be equivalent:

g %e% 'type'
as.matrix(g, attrname = 'type')

I also strongly suggest adding a warning to %e% that states it is coercing a vector to an adjacency matrix.

krivit commented 6 years ago

I wonder how safe it might be to do a type detection for input. I.e.,

skyebend commented 6 years ago

The type detection seems reasonable since I can't think of a use case where I'd want to depend on the existing behavior of coercing vector to matrix (seems to risky, is it row major or column major? are diagonal entries included?) Perhaps the first step could be to add the suggested coercion warning to set.edge.value(to find existing code that may depend on it) then deprecate and add new behavior?

krivit commented 5 years ago

I've implemented a version of that in the branch statnet/network@e_op_vector. One sticking point is that if the network has deleted edges, %e%<- might not behave as expected. For example:

> library(network)
[...]
> y <- network.initialize(5)
> y[1,] <- 1 # Add (1,2), (1,3), (1,4), and (1,5)
> y %e% "a" <- c("x","y")
> y %e% "a"
[1] "x" "y" "x" "y" # c("x","y") is recycled
> y[1,2] <- 0 # Delete one edge
> y %e% "a" # Remaining edges' attributes
[1] "y" "x" "y"
> y %e% "a" <- c("x","y","z") # Try updating them
> y %e% "a"
[1] "y" "z" "x"

What happens is that the internal edgelist still has an element for the deleted edge, so %e% expects a vector of length 4, not 3, and so it recycles c("x","y","z") into c("x","y","z","x") and then assigns the first "x" to a deleted edge.

One option is to have %e% skip deleted edges. That is, instead of calling set.edge.attribute(x,attrname=attrname,value=value) with edge IDs defaulting to e=seq_along(x$mel), instead call it with e=which(!sapply(x$mel, is.null)), which will skip deleted edges.

Any thoughts?

krivit commented 5 years ago

Assigning via %e% with a vector RHS now skips over deleted edges, so that nw %e% "eattr" <- nw %e% "eattr"leaves nw unchanged.

@CarterButts , is the new behaviour acceptable?

skyebend commented 5 years ago

we added the valid.eids() function as a way to make looking up the valid, non-deleted eids that would hopefully be future-proof and easier to read, might be useful here for default. Don't think there would be a performance hit, since it is the same internally? Although I'm not aware of code that depends on the existing behaviour or %e%, this does seem like somewhat risky change since it would likely break any existing workarounds. There are probably are (should be?) tests added to cover the new behavior?

krivit commented 5 years ago

@skyebend , valid.eids() is used now.

In principle, the behaviour of %e%<- with a non-matrix RHS is undefined, so no one should have been relying on the behaviour in question in the first place.

krivit commented 5 years ago

@CarterButts , @skyebend , is my implementation (in e_op_vector) OK? Any objections to my merging it into master?

tedmoorman commented 2 years ago

After poking around the repository a bit... It seems that coercing the attribute vector to a list using as.list before assigning via set.edge.attribute (as done in your tests) solves the issue.

set.edge.attribute(g, "type", as.list(g.att))
all(g.att == g %e% "type")
## TRUE

However, this does not work with %e%:

g %e% "type" = as.list(g.att)
all(g.att == g %e% "type")
## FALSE

Please, please put this in the documentation somewhere.