meringlab / FlashWeave.jl

Inference of microbial interaction networks from large-scale heterogeneous abundance data
Other
70 stars 8 forks source link

LightGraphs.betweenness_centrality() runs indefinitely when passed a graph with negative values #23

Closed jonasjonker closed 3 years ago

jonasjonker commented 3 years ago

I found another peculiar bug.

LightGraphs.betweenness_centrality() keeps running until RAM is full, and then crashes when it's Graph object is created from an FlashWeave object but not when it's created from a GML file.

using BenchmarkTools
using FlashWeave
using ParserCombinator
using GraphIO.GML
using LightGraphs

ID                  = 1001
ROOT                = "/home/jonas/Repos/Thesis/data/"
data_path           = "$(ROOT)$(ID)/processed_data/1_otu_table.biom"
gml_no_meta_path    = "$(ROOT)$(ID)/network_no_meta.gml"

netw_results_no_meta = FlashWeave.learn_network(data_path,
                                        sensitive     = true,
                                        heterogeneous = false)

save_network(gml_no_meta_path, netw_results_no_meta)

# {6558, 8484} undirected simple Int64 graph with Float64 weights
G = graph(netw_results_no_meta)

# {6558, 8484} undirected simple Int64 graph
g = loadgraph(gml_no_meta_path, GMLFormat())

@time centrality_arr  = betweenness_centrality(g)
#12.467820 seconds (45.30 M allocations: 7.531 GiB, 2.73% gc time)

@time centrality_arr  = betweenness_centrality(G)
# steadily increasing RAM usage until RAM is full and julia crashes.
pkg> status
Status `/mnt/nvme0n1p4/.julia/environments/v1.5/Project.toml`
  ...
  [2be3f83a] FlashWeave v0.18.0
  ...
jonasjonker commented 3 years ago

oh, and this is the data I used: Qitta

jonasjonker commented 3 years ago

Or is this an issue that I should report to LightGraphs (as well)?

jonasjonker commented 3 years ago

I added some information. creating an LightGraphs object via an GML file or directly does not result in identical objects. Maybe This is more of an LightGrpahs problem. Still, it might be nice to be aware of this problem.

jtackm commented 3 years ago

The difference is that FlashWeave doesn't use LightGraphs.SimpleGraph, but instead SimpleWeightedGraphs.SimpleWeightedGraph to store the weighted network. betweenness_centrality may not be as optimized for sparse matrices as used by SimpleWeightedGraphs, hence the RAM blowup.

You are right in that this is a problem on the LightGraphs front. Many algorithms in that package allow to explicitly provide a dense weighted adjacency matrix, which may be much faster and less memory intense for this algorithm, so perhaps you could file a feature request for LightGraphs? (or perhaps there is a more elegant solution)

jonasjonker commented 3 years ago

I looked a little further in this. And even an small SimpleWeightedGraph (SimpleWeightedGraph(smallgraph(:house))) crashes. But only in atom, not in jupyter notebooks... I'll look further into the bug. But it's not an FlashWeave issue.

jtackm commented 3 years ago

Perhaps the LightGraphs.jl or Julia versions differ between your Juno and Jupyter environments? I'd always try the #master to make sure this hasn't been fixed yet. Anyways, good luck!

jonasjonker commented 3 years ago

You might be interested in the issue I made at LightGraphs.

I also uploaded the graph to google drive so if you like, you can try it for yourself.

using JLD2

@load "g_weights.jld2" G

G.weights # 6558×6558 SparseArrays.SparseMatrixCSC{Float64,Int64} with 16968 stored entries:

@time centrality_arr  = betweenness_centrality(G)  # the graph that causes the issue
jonasjonker commented 3 years ago

I (or actually sbromberger) found the problem. #1531

Can't have negative weights with Dijkstra.

julia> minimum(g.weights)
-1.0

LightGraph doesn't throw errors given illegal input and trust users to know what the do. Which I clearly don't (yet).

jtackm commented 3 years ago

Wow, I didn't expect that, thanks for the heads up. I would have expected they check negative edge weights since a single sweep over nzvalues should be very cheap compared to Dijkstra, but well.

Negative weights are unfortunately fundamental for microbial ecosystems, so FlashWeave needs a way to represent them. For the special case of betweeness_centrality, you may however choose to remove these edges since they shouldn't matter for bottleneck analyses.