Accelerate to calcuration modularity

Inazuma110 commented 4 years ago

SimpleHypergraphs.jl version v1.6.0

https://github.com/pszufe/SimpleHypergraphs.jl/blob/1b03ab32977b96ff0be8553b82e596fcd5aaef56/src/modularity.jl#L62

This line takes time. I succeeded acceleration. I confirm this program works correctly and fast some hypergraphs and partitions.

    mr = 0
    v = 0
    for i in 1:nhv(h) v += length(gethyperedges(h, i)) end
    dict = Dict([])
    for i in 1:nhe(h)
      dict[length(getvertices(h, i))] = get(dict, length(getvertices(h, i)), 0) + 1
    end

    va = Dict([i => 0.0 for i in 1:length(partition)])
    for (i, cluster) in enumerate(partition)
      for node in cluster va[i] += length(gethyperedges(h, node)) end
      va[i] /= v
    end

    for (key, val) in dict
      tmp = 0
      for (key2, val2) in va
        tmp += (val2 ^ key)
      end
      tmp *= val
      mr += tmp
    end

    # (sum(eP) - sum( ha.Ed[d]*sum(volP_volV.^d) for d in 1:ha.max_hes)) / nhe(h)
    (sum(eP) - mr) / nhe(h)

However, I can't description why acceleration... (Originally, I was improving other line.) So you can ignore and close this issue. Please refer this program if you like. And sorry my poor English. Thank you.

pszufe commented 4 years ago

Hello thank you for the feedback.

I tried to run your code. In my tests (see below) your code takes twice as long to compute. Could you perhaps update the test below with an example illustrating a case where your code is actually faster? Additionally, in case of degenerate hypergraphs, yields wrong results (but this should be fairly easy to repair if the performance is good).

using SimpleHypergraphs
using Random
using StatsBase
using BenchmarkTools
Random.seed!(0)
m = sample([nothing, true], Weights([0.8,0.2]),(30,30))
h = Hypergraph(m)

part = SimpleHypergraphs.randompartition(h,5)
ha = SimpleHypergraphs.HypergraphAggs(h)

function modularity2(h::Hypergraph, partition::Vector{Set{Int}},
        ha::SimpleHypergraphs.HypergraphAggs=SimpleHypergraphs.HypergraphAggs(h))        
    @boundscheck sum(length.(partition)) == nhv(h)
    @boundscheck union(partition...) == Set(1:nhv(h))
    volP_volV = [sum(ha.deg_vs[i] for i in p)/ha.volV for p in partition]
    eP = [count(i-> ha.hes[i]>0 && (keys(h.he2v[i]) ⊆ p), 1:nhe(h)) for p in partition]
    mr = 0
    v = 0
    for i in 1:nhv(h) v += length(gethyperedges(h, i)) end
    dict = Dict([])
    for i in 1:nhe(h)
      dict[length(getvertices(h, i))] = get(dict, length(getvertices(h, i)), 0) + 1
    end

    va = Dict([i => 0.0 for i in 1:length(partition)])
    for (i, cluster) in enumerate(partition)
      for node in cluster va[i] += length(gethyperedges(h, node)) end
      va[i] /= v
    end

    for (key, val) in dict
      tmp = 0
      for (key2, val2) in va
        tmp += (val2 ^ key)
      end
      tmp *= val
      mr += tmp
    end

    # (sum(eP) - sum( ha.Ed[d]*sum(volP_volV.^d) for d in 1:ha.max_hes)) / nhe(h)
    (sum(eP) - mr) / nhe(h)
end

And now the tests:

julia> modu1 = @btime modularity($h,$part,$ha);
  12.700 μs (211 allocations: 8.27 KiB)

julia> modu2 = @btime modularity2($h,$part,$ha);
  21.500 μs (377 allocations: 11.29 KiB)

julia> modu1 ≈ modu2
true

Inazuma110 commented 4 years ago

First, generate hypergraph. It has clearly some clusters.

function create_hypergraph(npcs, hepcs, he_rate=0.5, noise=10)
  node_num = 0
  he_num = 0
  clusters = length(npcs)
  for npc in npcs
    node_num += npc
  end
  for hepc in hepcs
    he_num += hepc
  end
  h = Hypergraph{Int64}(node_num, he_num)
  step = 1
  fin_node = 1
  for (npc, hepc) in zip(npcs, hepcs)
    for i in 1:hepc
      # he_size = rand(1:npc)
      he = randsubseq(fin_node:fin_node+npc-1, he_rate)
      he = [(i in he ? 1 : nothing) for i in 1:nhv(h)]
      for (j, v) in enumerate(he)
        h[j, step] = v
      end
      step += 1
    end
    fin_node += npc
  end

  for i in 1:noise
    he = randsubseq(1:node_num, 0.5)
    he = Dict([i => 1 for i in he])
    add_hyperedge!(h, vertices=he)
  end

  h
end
h = create_hypergraph([100 for i in 1:5], [20 for i in 1:5], 0.5, 4)
rp = randompartition(h, 30)

npcs is node per clusters. This is array.
hepcs is hyperedge per clusters. This is array.
Arbitrary node is included hyperedges with he_rate probability.
Generated hypergraph has noise hyperedges. There are not fully included Arbitrary cluster.

Generated like this incidence matrix.

julia> mod1 = @btime modularity(h, rp)
  589.906 μs (3526 allocations: 194.89 KiB)
-1.5186397093946605e-53

julia> mod2 = @btime modularity2(h, rp)
  222.981 μs (5718 allocations: 152.38 KiB)
-1.518639709394661e-53

And accelerate when using findcommunities.

Using conventional modularity.


julia> cfm = CFModularityCNMLike(1000)
CFModularityCNMLike(1000)

julia> @benchmark findcommunities(h, cfm) BenchmarkTools.Trial: memory estimate: 335.32 MiB allocs estimate: 6643143

minimum time: 1.214 s (0.00% GC) median time: 1.781 s (3.26% GC) mean time: 1.911 s (3.10% GC) maximum time: 2.739 s (4.38% GC)

samples: 3 evals/sample: 1

- Using my modularity
```jl
julia> @benchmark findcommunities(h, cfm)
BenchmarkTools.Trial:
  memory estimate:  265.13 MiB
  allocs estimate:  10089341
  --------------
  minimum time:     585.823 ms (0.00% GC)
  median time:      761.730 ms (7.58% GC)
  mean time:        976.228 ms (5.85% GC)
  maximum time:     1.728 s (6.59% GC)
  --------------
  samples:          6
  evals/sample:     1

However, I understand that not everything is better.

pszufe / SimpleHypergraphs.jl

Accelerate to calcuration modularity #34

julia> @benchmark findcommunities(h, cfm) BenchmarkTools.Trial: memory estimate: 335.32 MiB allocs estimate: 6643143

minimum time: 1.214 s (0.00% GC) median time: 1.781 s (3.26% GC) mean time: 1.911 s (3.10% GC) maximum time: 2.739 s (4.38% GC)