zpneal / backbone

R backbone package - Extract the backbone from weighted and unweighted networks
https://www.rbackbone.net
40 stars 8 forks source link

sdsm(class = "edgelist") returns error "Graph must be unweighted" and "igraph" class does not #48

Closed tdelcey closed 5 months ago

tdelcey commented 5 months ago

Hi,

Thank you for this wonderful package.

I used your backbone::sdsm function on various datasets and its works fine. However, recently I worked on a new data set and I encountered the "Graph must be unweighted" error while my graph is clearly not. I overcome the issue by transforming my edgelist into an igraph and run the sdsm function but I think it should be something that should be reported to you.

Here is a reproducible example using two sample of two different datasets (I have anonymised the data). The first one works, the second return the error:

edgelist1.csv edgelist2.csv

`read.csv("edgelist1.csv") read.csv("edgelist2.csv")

backbone::sdsm(as.data.frame(edgelist1)) backbone::sdsm(as.data.frame(edgelist2)) ` Best,

Thomas

zpneal commented 5 months ago

Hi Thomas,

Thanks for raising this issue. The backbone package attempts to import edgelists. However, edgelists can sometimes be ambiguous, especially when they are intended to represent bipartite graphs, so these attempts sometimes fail. Given this risk, I'm leaning toward removing support for edgelists/dataframes, and simply asking users to supply networks as matrices or igraph objects, which are unambiguous.

In your examples, I identified three separate issues that could contribute to the ambiguity:

  1. edgelist1.csv contains some missing values, which are not supported by backbone
  2. both edgelist1.csv and edgelist2.csv could define non-bipartite graphs
  3. edgelist2.csv contains duplicate edges, which are interpreted as weighted edges

This code illustrates the issues in your two example datasets:

el1 <- read.csv("edgelist1.csv", row.names = 1)  #Import edgelist 1
el1[c(737,864),]  #ISSUE #1 ==> Edges 737 and 864 are NA for auteur_id
el1 <- el1[complete.cases(el1),]  #Focus on just the complete rows
net1 <- igraph::graph_from_edgelist(as.matrix(el1))  #Create an igraph object
igraph::is_bipartite(net1)  #ISSUE #2 ==> This edgelist does not define a bipartite graph
#The fact that sdsm() seems to work on el1 is coincidental; the results are likely not what you expect

el2 <- read.csv("edgelist2.csv", row.names = 1)  #Import edgelist 2
net2 <- igraph::graph_from_edgelist(as.matrix(el2))  #Create an igraph object
igraph::is_bipartite(net2)  #ISSUE #2 ==> This edgelist does not define a bipartite graph
igraph::E(net2)[igraph::which_multiple(net2)]  #ISSUE #3 ==> This edgelist contains duplicate edges
el2[which(el2$source_id==3 & el2$target_id==11458),]  #Example multi-edge, treated as an edge with weight = 2

I'll mark the issue as closed, but if I've misunderstood any features of your data or the issue you encountered, please let me know and feel free to re-open the issue.

Best, Zachary

tdelcey commented 5 months ago

Hi,

Thanks for the prompt answer. My code already handled issue 1 and 2 but I was not expecting duplicates in this dataset. Deleting duplicates indeed resolve the issue and it becomes clear how it could create this error.

Best,

Thomas