sagemath / sage

Main repository of SageMath
https://www.sagemath.org
Other
1.44k stars 481 forks source link

Cheeger constant(s) of graphs #21942

Closed pelegm closed 4 years ago

pelegm commented 7 years ago

This ticket adds three methods to graphs to compute three variations of the Cheeger constant:

Note that the implemented Cheeger constant is also the conductance of the simple random walk on the graph.

CC: @videlec @JeremiasE @slel

Component: graph theory

Keywords: cheeger, MathExp2018

Author: Vincent Delecroix, David Coudert

Branch: 8389196

Reviewer: Peleg Michaeli, Frédéric Chapoton

Issue created by migration from https://trac.sagemath.org/ticket/21942

videlec commented 6 years ago
comment:42

After make doc-clean you should be using make doc.

pelegm commented 6 years ago
comment:43

Works, thanks (took 2.5 hours, though...).

Some minor suggestions:

Other than that, everything looks great. You may switch to positive review after you apply the suggestions, or if you decide not to apply them.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Changed commit from 8484a37 to 10c362e

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

10c362e21942: better documentation
videlec commented 6 years ago
comment:45

Done!

vbraun commented 6 years ago
comment:46

Reviewer name?

videlec commented 6 years ago
comment:47

Sorry

videlec commented 6 years ago

Reviewer: Peleg Michaeli

pelegm commented 6 years ago
comment:48

Hi, I apologize for yet another late response, but just wanted to ask: for graphs without edges (and at least 2 vertices) you set the Cheeger constant to be 1.

It is correct that the constant is not well defined for graphs without edges, but still I would maybe set it to be 0, to be consistent with the general idea that larger Cheeger constant means higher connectivity, and the specific idea that non-connected graphs have Cheeger constant 0.

In fact, if you agree with this then we may add a quick heuristic at the beginning of the method, which checks whether the graph is connected, and returns 0 otherwise.

dcoudert commented 6 years ago
comment:50

I also agree that you should test that the input graph is connected.

Some remarks:

dcoudert commented 6 years ago
comment:51

from sage.rings import infinity -> from sage.rings.infinity import Infinity or from sage.rings.infinity import infinity, I don't know which is best.

dcoudert commented 6 years ago
comment:52

For vertex_isoperimetric_number, it is not realistic to try to compute this value for large graphs. For up to 31 vertices, you can do in Cython:

from sage.graphs.graph_decompositions.fast_digraph cimport FastDigraph, compute_out_neighborhood_cardinality, popcount32
from sage.rings.integer cimport Integer

def vertex_isoperimetric_number_fast(self):
    """
    """
    cdef FastDigraph FG = FastDigraph(self)

    cdef int card, boundary, S
    cdef int mc = FG.n // 2
    cdef int p = FG.n
    cdef int q = 0

    for S in range(1, 2**FG.n):
        card = popcount32(S)
        if card <= mc:
            boundary = compute_out_neighborhood_cardinality(FG, S)
            if boundary * q < p * card:
                p = boundary
                q = card

    return Integer(p) / Integer(q)

it iterates over all the non-empty subsets of vertices. popcount32(S) counts the number of 1's and so of vertices in the current subset S, and compute_out_neighborhood_cardinality computes the number of vertices in N(S)\S.

sage: %time vertex_isoperimetric_number(G) # your method
CPU times: user 4.36 ms, sys: 1.76 ms, total: 6.13 ms
Wall time: 9.11 ms
3/2
sage: G = graphs.CompleteGraph(5)
sage: %time vertex_isoperimetric_number_fast(G)
CPU times: user 83 µs, sys: 5 µs, total: 88 µs
Wall time: 88.9 µs
3/2

sage: G = graphs.CycleGraph(10)
sage: %time vertex_isoperimetric_number_fast(G)
CPU times: user 114 µs, sys: 17 µs, total: 131 µs
Wall time: 117 µs
2/5
sage: G = graphs.CycleGraph(20)
sage: %time vertex_isoperimetric_number_fast(G)
CPU times: user 34.7 ms, sys: 90 µs, total: 34.8 ms
Wall time: 34.8 ms
1/5
sage: G = graphs.CycleGraph(30)
sage: %time vertex_isoperimetric_number_fast(G)
CPU times: user 33.3 s, sys: 121 ms, total: 33.4 s
Wall time: 33.9 s
2/15

We can certainly do something similar for the other methods. We just need a smart way to compute |\partial X| / |Vol(X)| and |E(X)| (which is not properly defined in the method) and |\partial S| / |S|.

It is of course possible to add a parameter k to compute the value only over all subsets of order k (see https://en.wikipedia.org/wiki/Isoperimetric_inequality)

videlec commented 6 years ago
comment:53

I am not 100% convinced by FastDigraph:

videlec commented 6 years ago
comment:54

With dense representation and bit operations one can indeed remove the inner loops from the already cythonized cheeger_constant and edge_isoperimetric_number. On the other hand, these methods

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

36807eb21942: Cheeger constant
67881a321942: better documentation
b981a8d21942: cythonize vertex_isoperimetric_number
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Changed commit from 10c362e to b981a8d

videlec commented 6 years ago
comment:56

rebased on 8.3.beta4 + extra commit taking care of some of the remarks of David.

dcoudert commented 6 years ago
comment:57

I'm not claiming that FastDigraph is perfect. It was useful for implementing vertex separation and cutwidth. It can certainly be cleaned/improved/etc. (e.g., using __builtin_popcount and uint_fast32_t). However, for such algorithms, the extra cost of using bitsets is not justified since we can hardly go above 32 nodes: huge computation time and memory usage.

Here, if you think it is possible to compute the vertex_isoperimetric_number on larger graphs, then you can use binary_matrix. Another idea could be to use short_digraph and the same structure as for other methods, and maintain the vertices in the boundary (e.g., use an array to store the number of edges from S to each vertex outside S, and increase/decrease vol each time a cell becomes non-zero/zero).

I did some tests using static_dense_graph, and so binary_matrix, for cheeger_constant and edge_isoperimetric_number. My code is slightly faster for dense graphs but slower for sparse graphs. Not so interesting here.

There are a small improvements you can do:

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

8bcb49721942: typo in doc
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Changed commit from b981a8d to 8bcb497

videlec commented 6 years ago
comment:59

I agree with your suggestions. Though, a much more needed improvement would be to be able to iterate through subsets S of vertices with S and V \ S connected.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

83f5c8021942: small optimization
7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Changed commit from 8bcb497 to 83f5c80

dcoudert commented 6 years ago
comment:61

An algorithm for iterating over connected subgraphs is described here https://stackoverflow.com/questions/15658245/efficiently-find-all-connected-induced-subgraphs. May be we can adapt it...

At least, it is possible to avoid the recursion and so to have a connected subgraph iterator. Not clear how to get an efficient implementation. I'll think about it.

dcoudert commented 6 years ago
comment:62

I have implemented an iterator over connected subgraphs is Cython. It is reasonably fast. I should open a ticket for it, but I don't know yet in which file to add it. Any suggestion ?

It can certainly help improving this ticket, although current implementations are quite good.

videlec commented 6 years ago
comment:63

Replying to @dcoudert:

I have implemented an iterator over connected subgraphs in Cython. It is reasonably fast. I should open a ticket for it, but I don't know yet in which file to add it. Any suggestion ?

connected_subgraphs_iterator.pyx? Please, cc me.

It can certainly help improving this ticket, although current implementations are quite good.

To improve even more the computation of Cheeger constant, there should be an iterator over connected subgraph so that the complement is also connected! I can comment on this when your ticket is open.

Are you ok setting this ticket in positive review as it is? It will serve as a base for further improvements.

videlec commented 6 years ago
comment:64

Note that I can easily factor cheeger_constant and edge_isoperimetric_number (exact same computation except the small difference for vol). It might be worth not duplicating code.

videlec commented 6 years ago
comment:65

Replying to @videlec:

Note that I can easily factor cheeger_constant and edge_isoperimetric_number (exact same computation except the small difference for vol). It might be worth not duplicating code.

What let me hesitate is that we can cut branches in the backtracking and when to cut will differ. What do you think?

dcoudert commented 6 years ago
comment:66

Do you plan to add cuts in a near future ? and is the current structure of the code appropriate to add cuts ? if not, you can factor the code.

videlec commented 6 years ago
comment:67

Replying to @dcoudert:

Do you plan to add cuts in a near future ? and is the current structure of the code appropriate to add cuts ? if not, you can factor the code.

As my answers are "no" and "no" let me factor and add a note.

videlec commented 6 years ago
comment:68

I spoke too fast: there are already two different cuts! By the definition of Cheeger constant, you cut when Vol(s) > m while for the isoperimetric number you cut 2*|S| > n (where m and n are as usual number of edges and number of vertices).

dcoudert commented 6 years ago
comment:69

Right.

Don't forget to add sig_on sig_off so that we can kill computation if needed.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Changed commit from 83f5c80 to 50c1863

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 6 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

50c186321942: code interruptible with sig_on/sig_off
dcoudert commented 6 years ago
comment:71

I have a much better implementation for vertex_isoperimetric_number using the ideas from https://stackoverflow.com/questions/15658245/efficiently-find-all-connected-induced-subgraphs. You can use it on a multigraph with loops. No limit on the number of vertices, but of course the time complexity remains.

Roughly, it uses a non-recursive version of the algorithm to generate all connected subgraphs of order at most n/2. I'm therefore using a binary matrix as a stack of bitsets.

from sage.rings.rational_field import QQ
from sage.rings.infinity import Infinity
from cysignals.signals cimport sig_on, sig_off

include "sage/data_structures/binary_matrix.pxi"
from sage.graphs.base.static_dense_graph cimport dense_graph_init

def vertex_isoperimetric_number_iter(G):
    """
    """
    if G.is_directed():
        raise ValueError("vertex-isoperimetric number is only defined on non-oriented graph")
    if G.order() == 0:
        raise ValueError("vertex-isoperimetric number not defined for the empty graph")
    elif G.order() == 1:
        return Infinity
    elif not G.is_connected():
        return QQ((0,1))

    # We convert the graph to a static dense graph
    cdef binary_matrix_t DG
    dense_graph_init(DG, G)
    cdef int n = G.order()
    cdef int k = n / 2   # maximum size of a subset

    sig_on()

    # We use a stack of bitsets. For that, we need 3 bitsets per level with at
    # most n/2 + 1 levels, so 3 * (n / 2) + 3 bitsets. We also need 1 bitset that
    # we create at the same time, so so overall 3 * (n / 2) + 4 bitsets
    cdef binary_matrix_t stack
    binary_matrix_init(stack, 3 * (n / 2) + 4, n)

    cdef bitset_t candidates = stack.rows[3 * (n / 2) + 3]
    cdef bitset_t left     # vertices not yet explored
    cdef bitset_t current  # vertices in the current subset
    cdef bitset_t boundary # union of neighbors of vertices in current subset

    cdef int l = 0
    cdef int p = n
    cdef int q = 0
    cdef int c, b, v

    # We initialize the first level
    for v in range(3):
        bitset_clear(stack.rows[v])
    bitset_complement(stack.rows[0], stack.rows[0])

    while l >= 0:

        # We take the values at the top of the stack
        left = stack.rows[l]
        current = stack.rows[l + 1]
        boundary = stack.rows[l + 2]

        if bitset_isempty(current):
            bitset_copy(candidates, left)
        else:
            bitset_and(candidates, left, boundary)

        if bitset_isempty(candidates):
            # We decrease l to pop the stack
            l -= 3

            # If the current set if non empty, we update the lower bound
            c = bitset_len(current)
            if c:
                bitset_difference(boundary, boundary, current)
                b = bitset_len(boundary)
                if b * q < p * c:
                    p = b
                    q = c

        else:
            # Choose a candidate vertex v
            v = bitset_first(candidates)
            bitset_discard(left, v)

            # Since we have not modified l, the bitsets for iterating without v
            # in the subset current are now at the top of the stack, with 
            # correct values

            if bitset_len(current) < k:
                # We continue with v in the subset current
                l += 3
                bitset_copy(stack.rows[l], left)
                bitset_copy(stack.rows[l + 1], current)
                bitset_add(stack.rows[l + 1], v)
                bitset_union(stack.rows[l + 2], boundary, DG.rows[v])

    binary_matrix_free(stack)
    binary_matrix_free(DG)
    sig_off()

    return QQ((p, q))
videlec commented 6 years ago
comment:72

update milestone 8.3 -> 8.4

dkrenn commented 5 years ago
comment:73

Does not apply anymore.

dkrenn commented 5 years ago

Work Issues: merge confict

dkrenn commented 5 years ago

Changed work issues from merge confict to merge conflict

dcoudert commented 5 years ago
comment:76

Note that we now have a connected subgraph iterator #25553.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Changed commit from 50c1863 to 487bee9

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 5 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. New commits:

487bee921942: Cheeger constants
videlec commented 5 years ago
comment:78

(just a rebase, need to adapt the code to use the connected subgraph iterator)

videlec commented 5 years ago
comment:79

@dcoudert: I would be happy to use the subgraph iterator from #25553 however

One possible way to proceed is to make a proper Cython class for the subgraph iterator so that we can grab information (at C level) at each step of the iteration.

What do you think?

dcoudert commented 5 years ago
comment:80

Replying to @videlec:

@dcoudert: I would be happy to use the subgraph iterator from #25553 however

  • it is only implemented for dense graphs

It's reasonably fast this way :P But of course we could implement a sparse graph counter part.

  • the iterator carries many information that would be extremely useful for implementing isoperimetric constants (such as boundary)
  • passing around lists of vertices (the output of the iterator) is a big waste One possible way to proceed is to make a proper Cython class for the subgraph iterator so that we can grab information (at C level) at each step of the iteration.

What do you think?

We can certainly add such a class if it's needed. However, you have to be very careful if you want to access extra data. May be it's easier to copy/paste the code of the iterator, so that you can use the information you want safely at the time you need it. The advantage of the class is to silently switch between sparse and dense representations.

embray commented 5 years ago
comment:81

Tickets still needing working or clarification should be moved to the next release milestone at the soonest (please feel free to revert if you think the ticket is close to being resolved).

dcoudert commented 5 years ago
comment:82

Moving to 9.0. We can also work on a proper cython class for subgraph iterator if useful.

embray commented 4 years ago
comment:83

Ticket retargeted after milestone closed

mkoeppe commented 4 years ago
comment:84

Batch modifying tickets that will likely not be ready for 9.1, based on a review of the ticket title, branch/review status, and last modification date.

fchapoton commented 4 years ago

Changed commit from 487bee9 to 1fb6f64