tbilab / sbmr

An R-Native C++ implementation of the bipartite stochastic block model (biSBM) using rcpp
https://tbilab.github.io/sbmr/
Other
17 stars 2 forks source link

MCMC sweeps #10

Open verasiwei opened 4 years ago

verasiwei commented 4 years ago

I am wondering if somewhere has stored the previous MCMC sweeps states. For example, if I run this once

sbm_mcmc <- mcmc_sweep(sbm, num_sweeps = 100, eps = 0.1, track_pairs = TRUE)
state <- state(sbm_mcmc)
state %>%
  filter(type=="grid",level==0) %>%
  tbl_df(.) %>%
  group_by(parent) %>% 
  tally()
# A tibble: 6 x 2
  parent           n
  <chr>        <int>
1 bl_grid_1133    42
2 bl_grid_1137    57
3 bl_grid_1138    33
4 bl_grid_1200    51
5 bl_grid_1204    59
6 bl_grid_1206    17

And then I repeat it again (I thought it will start from the initial state from agglomerative merging again)

sbm_mcmc <- mcmc_sweep(sbm, num_sweeps = 100, eps = 0.1, track_pairs = TRUE)
state <- state(sbm_mcmc)
state %>%
  filter(type=="grid",level==0) %>%
  tbl_df(.) %>%
  group_by(parent) %>% 
  tally()
# A tibble: 11 x 2
   parent           n
   <chr>        <int>
 1 bl_grid_1133    20
 2 bl_grid_1137    39
 3 bl_grid_1138    18
 4 bl_grid_1200    28
 5 bl_grid_1204    29
 6 bl_grid_1206    20
 7 bl_grid_1213    17
 8 bl_grid_1218    23
 9 bl_grid_1220    35
10 bl_grid_1222    13
11 bl_grid_1224    17

And repeat again, I will always see a result with more and more smaller groups, it seems being split from a saved state, but not the initial state from agglomerative merging. For example, the 3rd time,

# A tibble: 19 x 2
   parent           n
   <chr>        <int>
 1 bl_grid_1133    12
 2 bl_grid_1137    25
 3 bl_grid_1138    15
 4 bl_grid_1200    21
 5 bl_grid_1204    23
 6 bl_grid_1206    12
 7 bl_grid_1213    12
 8 bl_grid_1218    13
 9 bl_grid_1220    21
10 bl_grid_1222    13
11 bl_grid_1224    18
12 bl_grid_1234    13
13 bl_grid_1235    16
14 bl_grid_1238     5
15 bl_grid_1242     5
16 bl_grid_1250    14
17 bl_grid_1251    15
18 bl_grid_1252     4
19 bl_grid_1253     2
nstrayer commented 4 years ago

You need to manually run the agglomerative merging initialization algorithm each time. So workflow typically is load data -> merge->MCMC sweep. Basically, the state will remain exactly the same after each run. The MCMC results themselves will get discarded though and only the most recent will be stored. This is to be more efficient with storage and the assumption that a typical workflow will just run the MCMC sweep a single time. Storage of all sweeps could definitely be added through which I could see being valuable if you were doing some sort of simulated annealing with epsilon.