statnet / network

Classes for Relational Data
Other
15 stars 8 forks source link

as.network.data.frame does not handle two-mode adjacency matrices correctly #64

Open CarterButts opened 3 years ago

CarterButts commented 3 years ago

When called with a two-mode adjacency matrix, as.network.matrix will correctly interpret this as a graph with an enforced bipartition, with the passed matrix being the off-diagonal portion of the full adjacency matrix. as.network.data.frame will not, and indeed returns errors if e.g. the matrix is square and has non-zero diagonal entries when loops==FALSE. (See also related issue of as.network.data.frame not respecting the same semantics as as.network.matrix.) Setting loops=TRUE and bipartite=TRUE does not rectify the problem, because it throws an error when loops are set on bipartite graphs.

For this issue, the needed fix is for as.network.data.frame to correctly detect and implement two-mode matrix processing. Here is a demonstration:

#Create a two-mode adjacency structure
set.seed(1331)
g<-matrix(rbinom(25,1,0.2),5,5)
diag(g)<-1

#Coerce to a graph the traditional way
gn<-as.network.matrix(g,bipartite=T)
gn
gn[,]

#Now, with data frames.
gf<-as.data.frame(g)
gfn<-as.network(gf,bipartite=T)         #Loop error!
gfn<-as.network(gf,bipartite=T,loops=T) #Displeased!
as.network.matrix(gf,bipartite=T)       #Works fine!

We should be seeing the same behavior for as.network.data.frame as as.network.matrix here, and are not.

knapply commented 3 years ago

(for this and https://github.com/statnet/network/issues/65)

I pointed out that using S3 dispatch would be a breaking change when it was requested that I use that instead of the function I originally proposed (network_from_data_frame()).

https://github.com/statnet/network/pull/20#issuecomment-564830962

martinamorris commented 3 years ago

Ok, so it sounds like we have an issue to address. IIUC, using the S3 dispatch for this function is what causes the breakage. I believe this choice was originally motivated by maintainability concerns. @CarterButts do you have a preferred solution?

krivit commented 2 years ago

I have mixed feelings about this. To me, a data frame is not a generalisation of a matrix or an array, though for bipartite networks, it's a bit less clear-cut.

That having been said, if bipartite=TRUE, and the matrix looks like an adjacency matrix, it makes more sense for as.network.data.frame() to interpret the way as.network.matrix() does: that rows are actors and columns are events. From what I understand, it currently interprets it as the "expanded bipartite" representation, in which both rows and columns contain both actors and events, and actor-actor and event-event blocks are fixed at 0.

I think this would fix @CarterButts's issue. @knapply, is there any reason not to change the bipartite=TRUE handling of adjacency data frames to be consistent with the matrix method?

knapply commented 2 years ago

The input shouldn't be a data frame if it's supposed to be a matrix.

The errors could probably be more informative ("is this supposed to be an adjacency matrix? If so, use as.matrix() first."), but this is not a bug -- it's user error.

If memory serves, the reason this is an issue is because the original as.network() default skipped S3 dispatch and called as.network.matrix() directly instead of attempting to coerce the input to a matrix first. Something like as.network(as.matrix(x)).

I'm assuming this normalized the behavior of passing data frames as input that really should've been matrices.

krivit commented 2 years ago

@knapply, perhaps I misremembered. Does as.network.data.frame() always treat the input data frame as an edge list of some type?

jdohmen commented 2 years ago

Has anything been changed in the as.network command? I can no longer read in all my empirical data after a statnet update. It used to work fine. Can´t find the error. Thank you guys!

> WissOperativeAnpass1 <- read_excel("WissOperativeAnpass.xlsx")
> NetWissOperativeAnpass1 <- as.network(WissOperativeAnpass1)
Error: `loops` is `FALSE`, but `x` contains loops.
The following values are affected:
    - `x[1, 1:2]`
    - `x[2, 1:2]`
    - `x[3, 1:2]`
    - `x[4, 1:2]`
    - `x[5, 1:2]`
    - `x[6, 1:2]`

Data

Also an issue here: https://community.rstudio.com/t/as-network-file-ergm-error-loops-is-false-but-x-contains-loops/115793

mbojan commented 2 years ago

@jdohmen indeed, in the recent version of network the data.frame is interpreted as an edgelist (first two columns) plus optional edge attributes (the remaining columns, if any). In your case the data frame is a "two-mode" (non-square) adjacency matrix. What you need is convert it to R matrix with e.g. data.matrix(), for example:

d <- data.frame(
+   a = c(0,0,1,1),
+   b = c(0,0,1,0),
+   c = c(1,1,0,0)
+ )

net <- as.network(data.matrix(d), bipartite = TRUE)
as.matrix(net)
#   a b c
# 1 0 0 1
# 2 0 0 1
# 3 1 1 0
# 4 1 0 0

In your case it will be something like

WissOperativeAnpass1 <- read_excel("WissOperativeAnpass.xlsx")
NetWissOperativeAnpass1 <- as.network(data.matrix(WissOperativeAnpass1), bipartite = TRUE)

... assuming you have no other columns in Excel beyond the adjacency information.

mbojan commented 2 years ago

@krivit @knapply @CarterButts , is it feasible to retain the original behavior by having an argument to as.network.data.frame() for the case above (https://github.com/statnet/network/issues/64#issuecomment-1156361501). I'm thinking input = c("adjacency", "edgelist") (and then match.arg() internally) or simply adjacency = TRUE (or FALSE if edgelist)?

jdohmen commented 2 years ago

@krivit @knapply @CarterButts , is it feasible to retain the original behavior by having an argument to as.network.data.frame() for the case above (#64 (comment)). I'm thinking input = c("adjacency", "edgelist") (and then match.arg() internally) or simply adjacency = TRUE (or FALSE if edgelist)?

Then PLEASE also add a NODELIST (ego, alter1, alter2). Empirical survey data mostly comes as a NODELIST. I have spent so much time with getting nodelists into statnet:

NODELIST

mbojan commented 2 years ago

@jdohmen I've made a separate issue #79 about such structured input. I believe this is so-called "adjacency list" (ego id and ids of it's "neighbors").