Open ababaian opened 1 month ago
I've narrowed down the issue, there are NA
values being passed on to the intitial
numeric vector of cluster_label_prop()
. When the labels are not fixed this does not seem to be an issue, but if the NA
integer are set as fixed, R crashes.
# R Crashes (as before)
lpa <- cluster_label_prop(g,
weights = E(g)$vrank,
initial = lab.df$int.label,
fixed = lab.df$fixed)
# R does not crash
na.label <- is.na(lab.df$int.label)
lab.df$int.label[ na.label ] <- 99
lpa <- cluster_label_prop(g,
weights = E(g)$vrank,
initial = lab.df$int.label,
fixed = lab.df$fixed)
# R Crashes
# lab.df$fixed[ 1 ] is TRUE
lab.df$int.label[ 1 ] <- NA
lpa <- cluster_label_prop(g,
weights = E(g)$vrank,
initial = lab.df$int.label,
fixed = lab.df$fixed)
@Antonov548 Can you run this using ASAN and see if it's still present in the dev version (to become 2.0.4)? When not using ASAN, a lack of crash does not indicate that there is no bug (I can't repro it on my machine, but that doesn't mean no bug).
Is there a good way to get you system log information from the time the crash happens which could help diagnose the problem?
I'm not sure, I don't think so.
It's good to note that passing NA values to igraph functions is almost never valid (certainly not here). The exceptions are storing attributes (NA values can be stored) and where a NA scalar has special meaning (e.g. weights=NA
).
That said, there should not be a crash.
I believe the major issue here is that the R interface does not do any validation when converting to an integer vector (igraph_vector_int_t
). See:
This is related to #1140, but for vectors rather than scalars. I noted it with a yellow mark in #840. Note that doing it for vectors may have a noticeable performance impact.
@krlmlr I would not make this issue block 2.0.4. A proper fix will be very time consuming.
Agreed, the NA
was actually an error on my side upstream of LPA, but having it take down the whole R session was annoying. Throwing an error if there are NA
in int.label
seems prudent, but if the error is not easily reproducible, then that may cause other errors in systems where it's working.
Yes, of course this should be fixed. The problem is that the proper fix is time-consuming, and requires a lot of care, as it involves reviewing some of the fundamental glue code between R and C, and not just this single function. This is why I recommended not blocking the next release on this issue.
What happens, and what did you expect instead?
I'm using the LPA algorithm implemented in
cluster_label_prop
on an undirected graph of moderate size (1042 vertices, 1124 edges) and when fixing the labels on a subset of the vertices, the function crashes R.To reproduce
Minimal reproducible example file: lpa.g.debug.zip
System information