igraph / rigraph

igraph R package
https://r.igraph.org
532 stars 200 forks source link

bug in assortativity_nominal #1283

Open csqsiew opened 4 months ago

csqsiew commented 4 months ago

Describe the bug When trying to compute assortativity for categorical labels attached to nodes, an error is returned when the labels are letters but not when the labels are numbers. It is mentioned that the bug is unexpected and to report it with a reproducible example.

To reproduce

The script below reproduces the bug

library(igraph)

set.seed(2)
g <- sample_gnm(10, 20)

V(g)$random1 <- sample(c(1, 2), 10, replace = T)
V(g)$random2 <- sample(c('1', '2'), 10, replace = T)
V(g)$random3 <- sample(c('A', 'B'), 10, replace = T)

# compute the assortativity of this node attribute 
assortativity_nominal(g, types = V(g)$random1) # this is OK 
assortativity_nominal(g, types = V(g)$random2) # this is OK
assortativity_nominal(g, types = V(g)$random3) # this leads to the output below

Error in assortativity_nominal(g, types = V(g)$random3) : At core/core/vector.pmt:126 : Assertion failed: size >= 0. This is an unexpected igraph error; please report this as a bug, along with the steps to reproduce it. Please restart your R session to avoid crashes or other surprising behavior. In addition: Warning message: In assortativity_nominal(g, types = V(g)$random3) : NAs introduced by coercion

Version information Which version of igraph are you using and where did you obtain it?

igraph_1.6.0 from CRAN

R version 4.3.3 (2024-02-29) Platform: x86_64-pc-linux-gnu (64-bit)

szhorvat commented 4 months ago

igraph_1.6.0

Please always test with the latest version before reporting issues. I cannot reproduce it with 2.0.2. Can you try this version?

csqsiew commented 4 months ago

Hello,

My bad for not checking the latest version.

I checked with the updated version of igraph (2.0.2) but there is still a problem with a slightly different message:

Error in assortativity_nominal(g, types = V(g)$random3) : 
  At vendor/cigraph/src/misc/mixing.c:122 : Vertex types must not be negative. Invalid value
In addition: Warning message:
In assortativity_nominal(g, types = V(g)$random3) :
  NAs introduced by coercion

Based on my reading of the manual, it seems that I cannot use character labels because as.integer() will fail during the conversion (instead of integers, NA value is returned). Hence, this isn't a bug and the issue can be closed.

szhorvat commented 4 months ago

With the current version of igraph, you will need to use consecutive integers, starting at 1, to represent categories. Names won't work.

I agree that the situation is not ideal. I must note that I am not an R user or R programmer, so I can't judge very well what is reasonable in R. Neither do I make the decision about whether we will do anything about this. But here's a suggestion for an improvement.

Let me know what you think @krlmlr

First, notice that the error message is not very good. It talks about negative indices, as in C we index the categories from 0. In R we index from 1. There's thus the usual problem about how to phrase the error to fit both. https://github.com/igraph/igraph/issues/2119

The types argument here represents categorical data. It would indeed be very nice if other representations than indices could be supported, for example string names. Categorical data appears in many places in igraph as an input argument, such as:

... and possibly others I'm not thinking of now.

Some high-level languages support categorical data directly. Isn't this what factor is for in R? Mathematica does not have a data type for this, but I do have functions to convert other representations to category indices, and I allow categories to be specified in flexible ways.

Categories also have different representations, each being most useful in specific contexts: we can assign a category name to each object/vertex: vertex 1 is "blue", vertex 2 is "red", vertex 3 is "blue"; or we can list the category members: "blue" contains {1, 3}, "red" contains {2}.

Should we then have a special Stimulus type specifically for categorical data? This would make it easy to auto-generate code that can handle various kinds of category representations that are convenient in the host language, and convert each to simple 0-based membership vectors that can be sent to C. The raw C errors we see here would never appear: error checking would be done by the function that converts the category representations. Users could work much more conveniently with such data.

Opinions, @krlmlr and @ntamas ?

Potentially related:

csqsiew commented 4 months ago

assortativity_nominal(g, types = factor(V(g)$random3))

I can confirm that using factor() in this way did not lead to any errors.

ntamas commented 4 months ago

Should we then have a special Stimulus type specifically for categorical data?

Yes, IMO it would be a good idea. Currently we have VERTEX_COLORS and EDGE_COLORS (probably only in the develop branch?). I think it's an ill-suited name but semantically it means the same thing, isn't it?

krlmlr commented 4 months ago

A simple types <- as.integer(as.factor(types)) in assortativity_nominal() might do?

It would be idiomatic for R to support character labels, as long as the roundtrip works. The error message is fine as is too. No strong opinion.