sorgerlab / indra

INDRA (Integrated Network and Dynamical Reasoning Assembler) is an automated model assembly system interfacing with NLP systems and databases to collect knowledge, and through a process of assembly, produce causal graphs and dynamical models.
http://indra.bio
BSD 2-Clause "Simplified" License
178 stars 68 forks source link

Remove networkx<3 version constraint #1442

Closed bgyori closed 5 months ago

bgyori commented 6 months ago

It would be good to remove the networkx<=2 constraint but there is at least one test that fails with networkx>=3:

FAILED indra/tests/test_pysb_assembler.py::test_contact_map_cycles_1 - AssertionError: assert ['b(a)', 'a(b...c(b)', 'b(c)'] == ['a(b)', 'b(a...c(a)', 'a(c)']

  At index 0 diff: 'b(a)' != 'a(b)'

  Full diff:
    [
  +     'b(a)',
        'a(b)',
  +     'a(c)',
  -     'b(a)',
  ?      ^
  +     'c(a)',
  ?      ^
  +     'c(b)',
        'b(c)',
  -     'c(b)',
  -     'c(a)',
  -     'a(c)',
    ]

It's possible that this is something trivial like the order of nodes in a reported cycle but would have to be investigated.

kkaris commented 6 months ago

The issue is with cycle_basis producing a changed output. I dug into Networkx's code base and checked the diff for networkx/algorithms/cycles.py between 2.8.8 and 3.3: image

More importantly, they've gone from a set to a dict in the algorithm and when the root node is not specified, this makes a difference.

Setup for test:

from indra.statements import *
from indra.assemblers.pysb import PysbAssembler
from indra.assemblers.pysb.export import export_cm_network
from indra.assemblers.pysb.kappa_util import get_cm_cycles
from networkx.algorithms.cycles import cycle_basis
stmts = [Complex([Agent('a'), Agent('b')]),
             Complex([Agent('a'), Agent('c')]),
             Complex([Agent('b'), Agent('c')])]
pa = PysbAssembler(stmts)
pa.make_model()
graph = export_cm_network(pa.model)
cycles = get_cm_cycles(graph)

Set vs dict

nodes_set = set(graph.nodes())
nodes_set.pop()
# 0
nodes_set
# {(0, 0), (0, 1), (1, 0), (1, 1), (2, 0), (2, 1), 1, 2}

nodes_dict = dict.fromkeys(graph)
nodes_dict.popitem()[0]
# (2, 1)
nodes_dict
# {0: None,
#  (0, 0): None,
#  (0, 1): None,
#  1: None,
#  (1, 0): None,
#  (1, 1): None,
#  2: None,
#  (2, 0): None}

Providing a root node

If the root node to look for cycles from is provided, the output can be made consistent between versions:

In Networkx 2.8.8

cycle_basis(graph)
# [[(0, 0), (1, 0), 1, (1, 1), (2, 1), 2, (2, 0), (0, 1), 0]]
cycle_basis(graph, 0)
# [[(0, 0), (1, 0), 1, (1, 1), (2, 1), 2, (2, 0), (0, 1), 0]]

In Networkx 3.3

cycle_basis(graph)
# [[(1, 1), 1, (1, 0), (0, 0), 0, (0, 1), (2, 0), 2, (2, 1)]]
cycle_basis(graph, 0)
# [[(0, 0), (1, 0), 1, (1, 1), (2, 1), 2, (2, 0), (0, 1), 0]]

I will update out get_cm_cycles() in the pysb assembler module to allow for a root node to be provided with None as default so the output can be predicted when necessary.

kkaris commented 6 months ago

I could not find any literal text match of get_cm_cycles outside of the indra tests or indra/assemblers/pysb/kappa_util.py. Unless this function is called in some dynamic manner I don't think its used in any code that I have access to.