paulbrodersen / netgraph

Publication-quality network visualisations in python
GNU General Public License v3.0
667 stars 40 forks source link

Problems with very large graphs #67

Closed AdrianaLecourieux closed 1 year ago

AdrianaLecourieux commented 1 year ago

Hi, I was testing your tool, I just tried the Graph command and the error "MemoryError: Grid" occured. I have a huge graph (approximatively 135000 nodes and more than 400000 edges). I have a lot of RAM and CPUs so I don't think the problem comes from here. Do you know why I have this error ?

paulbrodersen commented 1 year ago

Please provide a minimal, reproducible example that results in the error, and post the full error trace.

Also, keep in mind that with these many nodes and edges, you won't be able to visually discern anything, as a short back-of-an-envelope calculation would suggest: your screen probably has about 1 million pixels. With 400 000 edges, you have 2.5 pixels for each edge, ignoring that you need some whitespace around each edge to be able to discern it from other edges. By comparison, each letter you are viewing here in this message, uses about 200-400 pixels (with the exact value depending on your browser, zoom, default font, etc.). I suggest that you only plot parts of the network that you actually care about, or coarse-grain the network if you care about its coarse structure.

AdrianaLecourieux commented 1 year ago

Hi, I can't share my graph but I did :

import matplotlib.pyplot as plt
from netgraph import Graph, InteractiveGraph, EditableGraph
import numpy
Graph(g_test)
plt.show()

And the full error trace is :

---------------------------------------------------------------------------
MemoryError                               Traceback (most recent call last)
/tmp/ipykernel_77/1492503790.py in <module>
----> 1 Graph(g_test)
      2 plt.show()

/opt/conda/lib/python3.9/site-packages/netgraph/_main.py in __init__(self, graph, edge_cmap, *args, **kwargs)
   1399             kwargs.setdefault('node_zorder', node_zorder)
   1400 
-> 1401         super().__init__(edges, *args, **kwargs)
   1402 
   1403 

/opt/conda/lib/python3.9/site-packages/netgraph/_main.py in __init__(self, edges, nodes, node_layout, node_layout_kwargs, node_shape, node_size, node_edge_width, node_color, node_edge_color, node_alpha, node_zorder, node_labels, node_label_offset, node_label_fontdict, edge_width, edge_color, edge_alpha, edge_zorder, arrows, edge_layout, edge_layout_kwargs, edge_labels, edge_label_position, edge_label_rotate, edge_label_fontdict, origin, scale, prettify, ax, *args, **kwargs)
    308         self.origin = origin
    309         self.scale = scale
--> 310         self.node_positions = self._initialize_node_layout(
    311             node_layout, node_layout_kwargs, origin, scale, node_size)
    312 

/opt/conda/lib/python3.9/site-packages/netgraph/_main.py in _initialize_node_layout(self, node_layout, node_layout_kwargs, origin, scale, node_size)
    451             if (node_layout == 'spring') or (node_layout == 'dot') or (node_layout == 'radial'):
    452                 node_layout_kwargs.setdefault('node_size', node_size)
--> 453             return self._get_node_positions(node_layout, node_layout_kwargs, origin, scale)
    454 
    455         elif isinstance(node_layout, dict):

/opt/conda/lib/python3.9/site-packages/netgraph/_main.py in _get_node_positions(self, node_layout, node_layout_kwargs, origin, scale)
    462             return {self.nodes[0]: np.array([origin[0] + 0.5 * scale[0], origin[1] + 0.5 * scale[1]])}
    463         if node_layout == 'spring':
--> 464             node_positions = get_fruchterman_reingold_layout(
    465                 self.edges, nodes=self.nodes, origin=origin, scale=scale, **node_layout_kwargs)
    466             if len(node_positions) > 3: # Qhull fails for 2 or less nodes

/opt/conda/lib/python3.9/site-packages/netgraph/_node_layout.py in wrapped_layout_function(edges, nodes, *args, **kwargs)
     65                     edges, components, layout_function, mode='side-by-side', *args, **kwargs)
     66             else:
---> 67                 return get_layout_for_multiple_components(
     68                     edges, components, layout_function, mode='packed', *args, **kwargs)
     69         else:

/opt/conda/lib/python3.9/site-packages/netgraph/_node_layout.py in get_layout_for_multiple_components(edges, components, layout_function, origin, scale, mode, *args, **kwargs)
    110 
    111     if mode == 'packed':
--> 112         bboxes = _get_packed_component_bboxes(components, origin, scale)
    113     elif mode == 'side-by-side':
    114         bboxes = _get_side_by_side_component_bboxes(components, origin, scale)

/opt/conda/lib/python3.9/site-packages/netgraph/_node_layout.py in _get_packed_component_bboxes(components, origin, scale, power, pad_by)
    170     scalar = 20
    171     integer_dimensions = [(int(scalar*width), int(scalar*height)) for width, height in padded_dimensions]
--> 172     origins = pack(integer_dimensions) # NB: rpack claims to return upper-left corners, when it actually returns lower-left corners
    173 
    174     bboxes = [(x, y, scalar*width, scalar*height) for (x, y), (width, height) in zip(origins, relative_dimensions)]

/opt/conda/lib/python3.9/site-packages/rpack/__init__.py in pack(sizes, max_width, max_height)
    216     if not isinstance(sizes, list):
    217         sizes = list(sizes)
--> 218     return _pack(sizes, max_width or -1, max_height or -1)

/opt/conda/lib/python3.9/site-packages/rpack/_core.pyx in rpack._core.pack()

/opt/conda/lib/python3.9/site-packages/rpack/_core.pyx in rpack._core.Grid.__cinit__()

MemoryError: Grid

Yes I'm interesting in the wole structure, thank you for the advice, I'll check how to perform coarse-grain.

paulbrodersen commented 1 year ago

Thanks for sharing the error trace, which turned out to be more interesting than I thought it would be. You seem to have multiple disconnected components in your graph (which you could analyze and plot separately, which in turn might cut down considerably on your compute needs), and you are actually running out of memory while netgraph is trying to compute how to arrange the different components with respect to each other. Not at all an operation that I would have thought requires a lot of RAM. Unfortunately, this operation is handled by rectangle-packer, not my own code.

Next steps:

  1. I will talk to @penlect, the author of rectangle-packer, and see if he has any suggestions, how to reduce the memory consumption.
  2. You should figure out how many components you have, and what the size distribution looks like. In particular, knowing the number of nodes and edges in the largest component would be of interest.
  3. Try to plot any of the components individually. Try a smallish component first, just to make sure that the graph structures you are handing to netgraph have the correct format. Then try the largest component and see if you are still running out memory.
  4. If the largest component is still too large, coarse-graining is probably your best bet.
AdrianaLecourieux commented 1 year ago

I tested with an other large graph I have and the error is different now.

I have one connected components. My graph has 313764 nodes and 939811 edges.

The error is :

---------------------------------------------------------------------------
RecursionError                            Traceback (most recent call last)
/tmp/ipykernel_77/1731607108.py in <module>
----> 1 Graph(g_B2)
      2 plt.show()

/opt/conda/lib/python3.9/site-packages/netgraph/_main.py in __init__(self, graph, edge_cmap, *args, **kwargs)
   1399             kwargs.setdefault('node_zorder', node_zorder)
   1400 
-> 1401         super().__init__(edges, *args, **kwargs)
   1402 
   1403 

/opt/conda/lib/python3.9/site-packages/netgraph/_main.py in __init__(self, edges, nodes, node_layout, node_layout_kwargs, node_shape, node_size, node_edge_width, node_color, node_edge_color, node_alpha, node_zorder, node_labels, node_label_offset, node_label_fontdict, edge_width, edge_color, edge_alpha, edge_zorder, arrows, edge_layout, edge_layout_kwargs, edge_labels, edge_label_position, edge_label_rotate, edge_label_fontdict, origin, scale, prettify, ax, *args, **kwargs)
    308         self.origin = origin
    309         self.scale = scale
--> 310         self.node_positions = self._initialize_node_layout(
    311             node_layout, node_layout_kwargs, origin, scale, node_size)
    312 

/opt/conda/lib/python3.9/site-packages/netgraph/_main.py in _initialize_node_layout(self, node_layout, node_layout_kwargs, origin, scale, node_size)
    451             if (node_layout == 'spring') or (node_layout == 'dot') or (node_layout == 'radial'):
    452                 node_layout_kwargs.setdefault('node_size', node_size)
--> 453             return self._get_node_positions(node_layout, node_layout_kwargs, origin, scale)
    454 
    455         elif isinstance(node_layout, dict):

/opt/conda/lib/python3.9/site-packages/netgraph/_main.py in _get_node_positions(self, node_layout, node_layout_kwargs, origin, scale)
    462             return {self.nodes[0]: np.array([origin[0] + 0.5 * scale[0], origin[1] + 0.5 * scale[1]])}
    463         if node_layout == 'spring':
--> 464             node_positions = get_fruchterman_reingold_layout(
    465                 self.edges, nodes=self.nodes, origin=origin, scale=scale, **node_layout_kwargs)
    466             if len(node_positions) > 3: # Qhull fails for 2 or less nodes

/opt/conda/lib/python3.9/site-packages/netgraph/_node_layout.py in wrapped_layout_function(edges, nodes, *args, **kwargs)
     52         # determine if there are more than one component
     53         adjacency_list = _edge_list_to_adjacency_list(edges, directed=False)
---> 54         components = _get_connected_components(adjacency_list)
     55 
     56         if nodes:

/opt/conda/lib/python3.9/site-packages/netgraph/_utils.py in _get_connected_components(adjacency_list)
    594     while not_visited: # i.e. while stack is non-empty (empty set is interpreted as `False`)
    595         start = not_visited.pop()
--> 596         component = _dfs(adjacency_list, start)
    597         components.append(component)
    598 

/opt/conda/lib/python3.9/site-packages/netgraph/_utils.py in _dfs(adjacency_list, start, visited)
    640     for node in adjacency_list[start] - visited:
    641         if node in adjacency_list:
--> 642             _dfs(adjacency_list, node, visited)
    643         else: # otherwise no outgoing edge
    644             visited.add(node)

... last 1 frames repeated, from the frame below ...

/opt/conda/lib/python3.9/site-packages/netgraph/_utils.py in _dfs(adjacency_list, start, visited)
    640     for node in adjacency_list[start] - visited:
    641         if node in adjacency_list:
--> 642             _dfs(adjacency_list, node, visited)
    643         else: # otherwise no outgoing edge
    644             visited.add(node)

RecursionError: maximum recursion depth exceeded while calling a Python object
paulbrodersen commented 1 year ago

Welp, this is starting to feel like a walk of shame, as this time, netgraph crashed even earlier: it was trying to determine the number of components using depth-first-search (DFS) but exceeded the recursion limit. IIRC, there is a recursion-less version of DFS; I will add implementing it to my TODO list.

paulbrodersen commented 1 year ago

Out of interest, does netgraph work for you with small components (< 1000 nodes)?

paulbrodersen commented 1 year ago

Also, when you ran into the first error (the MemoryError), how many components did you have, and how many nodes did your largest component have?

paulbrodersen commented 1 year ago
  1. I have implemented a workaround for the MemoryError raised by rectangle-packer.
  2. I have also implemented a implemented a recursion free version breadth-first search for finding the connected components. Hence the RecursionError.
  3. I have added a tutorial to the documentation with some suggestions on how to deal with large graphs.

All these changes are currently only implemented on the dev branch as a I am preparing a new major version release. To install from the dev branch, use:

pip install https://github.com/paulbrodersen/netgraph/archive/dev.zip

However, I can't change the fact that with very large graphs, you will eventually run out of pixels. So use with caution & common sense.

ConstantinosM commented 7 months ago

Hello,

I also phase the same issue with the Recursion error. My network has 9223 nodes and 102623 edges.

paulbrodersen commented 7 months ago

@ConstantinosM Are you using the dev branch?

All these changes are currently only implemented on the dev branch as a I am preparing a new major version release.

ConstantinosM commented 7 months ago

Thank you for your response! I am using the main branch. Essentially the release from July 2023.

Edit: ok, i see that the comments were on July 19 and the latest stable release was in July 12. Perhaps i need to install the dev branch.

paulbrodersen commented 7 months ago

@ConstantinosM Yeah, major version changes take a while as this project is still (mostly) a one man show. Let me know if you run into any issues!