pmneila / PyMaxflow

Python library for creating flow networks and computing the maxflow/mincut (aka graph-cuts for Python)
http://pmneila.github.io/PyMaxflow/
242 stars 59 forks source link

Remove nodes from graph #57

Closed nirmalsnair closed 2 years ago

nirmalsnair commented 2 years ago

I need a grid-like graph and it would be a lot easier if I could build a grid graph first and then remove a couple of nodes, instead of building the graph node-by-node.

Is it possible to remove a node from an existing graph?

pmneila commented 2 years ago

Hi @nsn88,

No, unfortunately the core library of PyMaxflow does not support removing nodes and the associated edges. However, you can just avoid creating edges to those nodes. Given that PyMaxflow omits all the edges created to/from the node id -1, you can substitute the corresponding nodeids by -1. For example:

import maxflow

g = maxflow.GraphFloat()
nodeids = g.add_grid_nodes((4, 5))
nodeids[1, 2] = -1
print(nodeids)
# [[ 0  1  2  3  4]
#  [ 5  6 -1  8  9]
#  [10 11 12 13 14]
#  [15 16 17 18 19]]
g.add_grid_edges(nodeids)

This code generates this graph: Figure_1

Does that help with your problem?

nirmalsnair commented 2 years ago

Given that PyMaxflow omits all the edges created to/from the node id -1, you can substitute the corresponding nodeids by -1.

Oh, that's neat, thanks for the tip. But I guess this won't reduce the memory consumption, which is my main concern.

My graph has different edge capacities, so anyway I have to loop through all the edges to set the capacity, and I can just skip the unwanted edges. But the problem is that my graph has around 4,000,000 nodes (200x200x100 grid) of which 30% are unwanted ones. If I could remove those nodes, memory consumption can be brought down.

pmneila commented 2 years ago

Hi,

Got a few comments:

  1. Nodes are very cheap regarding memory (IIRC, just two integers are stored per node). 4 million nodes use around 60Mb of memory, and 30% is just an overhead of 18Mb. Not sure if you are working with important memory limitations, but otherwise this should not be a huge issue.
  2. In any case, you can create the exact number of nodes you need and then distribute them in the grid array, filling the empty spaces with -1. For example, if you only need 3 nodes in a 4x4 grid:
    g = maxflow.GraphFloat()
    nodeids = g.add_grid_nodes(3)
    # Locations of the 3 nodes in the grid (rows, cols)
    # You can also use a boolean mask or any other type of NumPy's advanced indexing
    locations = [0, 1, 2], [1, 3, 3]
    grid = np.full((4, 4), -1)
    grid[locations] = nodeids
    print(grid)
    # [[-1  0 -1 -1]
    #  [-1 -1 -1  1]
    #  [-1 -1 -1  2]
    #  [-1 -1 -1 -1]]
  3. If it's easier for you, you can also create a list of node pairs (source, target), one per edge, and a similar array of the capacities of those edges. Then you can pass that to add_grid_edges with a proper structure to create edges from left to right. You don't need to explicitly put the nodeids in an array with the shape of your grid. This way, you can define advanced graphs without looping through the edges, which is much slower. In any case, it is still necessary to be able to calculate capacities without looping, otherwise nothing is gained by doing this.
    g = maxflow.GraphFloat()
    nodeids = g.add_grid_nodes(3)
    # Connect 0->2, 2->1, 1->0 with capacities 0.5, 1.5 and 2.5.
    grid = nodeids[np.array([
    [0, 2],
    [2, 1],
    [1, 0]
    ])]
    capacities = np.array([
    [0.5, 0.0],
    [1.5, 0.0],
    [2.5, 0.0]
    ])
    # Structure to create edges from left to right
    structure = np.array([
    [0, 0, 0],
    [0, 0, 1],
    [0, 0, 0]
    ])
    g.add_grid_edges(grid, weights=capacities, structure=structure)

Hope at least one of those options helps.

nirmalsnair commented 2 years ago

Awesome! This surely helps. You should consider including the code in comment-2 in the documentation/examples.

Regarding comment-1, my PC has 64 GB memory, so going by your numbers, I probably shouldn't worry about removing unwanted nodes. I'll do a memory profile and see if this is needed.

Thanks for the quick response.