Issues with memory in DBN em or Belief propagation inference

BKAmos commented 1 year ago

Hi there,

I'm writing because I've encountered a memory issue in the belief propagation portion of the analysis on a Dynamic Bayesian Network (DBN).

When using my actual data, I have a DBN with 11 time steps and roughly 13 nodes per time step. Pseudo code below.

library(bnstruct)

dataset.from.file <- BNDataset('family_dbn_data.txt', family_dbn_headers.txt')
layers <- scan(file = "family_layers.txt")
#layers <- c(1,1,1,2,2,2,3,3,3)
#They LayerStruct Matrix allows for specifying connections between layers
layerStruct <- matrix(c(1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,1,1),11,11)
dbn <- learn.dynamic.network(dataset.from.file, num.time.steps=11, layering=layers, layer.struct=layerStruct, alpha = .01, max.parents = 3, ess = 4)

Up until here the code works. I can get the dag from the DBN and analyze the network created.

The issue is with the code below.

engine <- InferenceEngine(dbn)
results<-em(engine,dataset.from.file, threshold = .05, max.em.iterations = 2)

I've also tried with belief propagation but that didn't work. The issue is always running out of memory. I thought I could reduce memory by limiting edges within the network, raising the threshold in em, or decreasing the max.em.iterations. None of those approaches seemed to work. I also edited the .Renviron file to increase V_MAX_MEM. That also didn't work.

Is there something that I'm missing?

Do I just need to drastically decrease the network size? Is there a maximum network size for inference or does it not work with DBNs?

Open to suggestions or thoughts.

albertofranzin commented 1 year ago

Hello,

the things you tried to address the memory problem are all valid, the only other one I can think of is to limit the cardinality of each node (possible mostly with continuous nodes that get discretized, less feasible with discrete nodes). BTW, the node cardinality is also the reason why it's impossible to say if there is a maximum size over which the EM fails. If you have binary nodes, however, it's difficult to shrink their size even more...

However, your network is very big, and the inference engine uses internally an algorithm to find cliques, so it's also possible that the network is just too big for the package.

One possibility is to split the network into chunks of fewer time steps and combine them afterwards.

Alberto

BKAmos commented 1 year ago

Hi again,

For the cardinality of the nodes. I only have nodes that can take two states, so they are binary nodes (0/1). So, I don't think that I can shrink the dataset anymore. I have attached the header file below to further outline what I'm saying to make sure we're on the same page.

"Alcaligenaceae_t0" "Bradyrhizobiaceae_t0" "Burkholderiaceae_t0" "Caulobacteraceae_t0" "Comamonadaceae_t0" "Flavobacteriaceae_t0" "Hyphomicrobiaceae_t0" "Microbacteriaceae_t0" "Mycobacteriaceae_t0" "Oxalobacteraceae_t0" "Phyllobacteriaceae_t0" "Pseudomonadaceae_t0" "Rhizobiaceae_t0" "Sphingomonadaceae_t0" "Xanthomonadaceae_t0" "Alcaligenaceae_t1" "Bradyrhizobiaceae_t1" "Burkholderiaceae_t1" "Caulobacteraceae_t1" "Comamonadaceae_t1" "Flavobacteriaceae_t1" "Hyphomicrobiaceae_t1" "Microbacteriaceae_t1" "Mycobacteriaceae_t1" "Oxalobacteraceae_t1" "Phyllobacteriaceae_t1" "Pseudomonadaceae_t1" "Rhizobiaceae_t1" "Sphingomonadaceae_t1" "Xanthomonadaceae_t1" "Alcaligenaceae_t2" "Bradyrhizobiaceae_t2" "Burkholderiaceae_t2" "Caulobacteraceae_t2" "Comamonadaceae_t2" "Flavobacteriaceae_t2" "Hyphomicrobiaceae_t2" "Microbacteriaceae_t2" "Mycobacteriaceae_t2" "Oxalobacteraceae_t2" "Phyllobacteriaceae_t2" "Pseudomonadaceae_t2" "Rhizobiaceae_t2" "Sphingomonadaceae_t2" "Xanthomonadaceae_t2" "Alcaligenaceae_t3" "Bradyrhizobiaceae_t3" "Burkholderiaceae_t3" "Caulobacteraceae_t3" "Comamonadaceae_t3" "Flavobacteriaceae_t3" "Hyphomicrobiaceae_t3" "Microbacteriaceae_t3" "Mycobacteriaceae_t3" "Oxalobacteraceae_t3" "Phyllobacteriaceae_t3" "Pseudomonadaceae_t3" "Rhizobiaceae_t3" "Sphingomonadaceae_t3" "Xanthomonadaceae_t3" "Alcaligenaceae_t4" "Bradyrhizobiaceae_t4" "Burkholderiaceae_t4" "Caulobacteraceae_t4" "Comamonadaceae_t4" "Flavobacteriaceae_t4" "Hyphomicrobiaceae_t4" "Microbacteriaceae_t4" "Mycobacteriaceae_t4" "Oxalobacteraceae_t4" "Phyllobacteriaceae_t4" "Pseudomonadaceae_t4" "Rhizobiaceae_t4" "Sphingomonadaceae_t4" "Xanthomonadaceae_t4" "Alcaligenaceae_t5" "Bradyrhizobiaceae_t5" "Burkholderiaceae_t5" "Caulobacteraceae_t5" "Comamonadaceae_t5" "Flavobacteriaceae_t5" "Hyphomicrobiaceae_t5" "Microbacteriaceae_t5" "Mycobacteriaceae_t5" "Oxalobacteraceae_t5" "Phyllobacteriaceae_t5" "Pseudomonadaceae_t5" "Rhizobiaceae_t5" "Sphingomonadaceae_t5" "Xanthomonadaceae_t5" "Alcaligenaceae_t6" "Bradyrhizobiaceae_t6" "Burkholderiaceae_t6" "Caulobacteraceae_t6" "Comamonadaceae_t6" "Flavobacteriaceae_t6" "Hyphomicrobiaceae_t6" "Microbacteriaceae_t6" "Mycobacteriaceae_t6" "Oxalobacteraceae_t6" "Phyllobacteriaceae_t6" "Pseudomonadaceae_t6" "Rhizobiaceae_t6" "Sphingomonadaceae_t6" "Xanthomonadaceae_t6" "Alcaligenaceae_t7" "Bradyrhizobiaceae_t7" "Burkholderiaceae_t7" "Caulobacteraceae_t7" "Comamonadaceae_t7" "Flavobacteriaceae_t7" "Hyphomicrobiaceae_t7" "Microbacteriaceae_t7" "Mycobacteriaceae_t7" "Oxalobacteraceae_t7" "Phyllobacteriaceae_t7" "Pseudomonadaceae_t7" "Rhizobiaceae_t7" "Sphingomonadaceae_t7" "Xanthomonadaceae_t7" "Alcaligenaceae_t8" "Bradyrhizobiaceae_t8" "Burkholderiaceae_t8" "Caulobacteraceae_t8" "Comamonadaceae_t8" "Flavobacteriaceae_t8" "Hyphomicrobiaceae_t8" "Microbacteriaceae_t8" "Mycobacteriaceae_t8" "Oxalobacteraceae_t8" "Phyllobacteriaceae_t8" "Pseudomonadaceae_t8" "Rhizobiaceae_t8" "Sphingomonadaceae_t8" "Xanthomonadaceae_t8" "Alcaligenaceae_t9" "Bradyrhizobiaceae_t9" "Burkholderiaceae_t9" "Caulobacteraceae_t9" "Comamonadaceae_t9" "Flavobacteriaceae_t9" "Hyphomicrobiaceae_t9" "Microbacteriaceae_t9" "Mycobacteriaceae_t9" "Oxalobacteraceae_t9" "Phyllobacteriaceae_t9" "Pseudomonadaceae_t9" "Rhizobiaceae_t9" "Sphingomonadaceae_t9" "Xanthomonadaceae_t9" "Alcaligenaceae_t10" "Bradyrhizobiaceae_t10" "Burkholderiaceae_t10" "Caulobacteraceae_t10" "Comamonadaceae_t10" "Flavobacteriaceae_t10" "Hyphomicrobiaceae_t10" "Microbacteriaceae_t10" "Mycobacteriaceae_t10" "Oxalobacteraceae_t10" "Phyllobacteriaceae_t10" "Pseudomonadaceae_t10" "Rhizobiaceae_t10" "Sphingomonadaceae_t10" "Xanthomonadaceae_t10"
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d d

Below is the top line of the data file. There are only 4 batches in our data so three additional lines.

1 2 1 1 1 1 2 2 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 2 1 1 1 1 2 1 2 1 1 1 1 2 1 2 1 2 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 1 1 2 1 1 2 2 1 2 1 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 1 2 1 1 1 1 1 2 1 2 1 1 1 1 1 1 1 2 1 1 2 2

When I run these with the layering file, which creates 11 layers out of the 165 data points and begins to perform either belief propagation or expectation maximization I see memory reach 1.45 Tb before I shut down the process. I have run this on a compute cluster.

So, I think that the network is just too large for the package for inference.

I can try chopping it up into smaller chunks to see what happens. If you have any other ideas or comments I would be all ears.

Let me know.

Best, Kirtley

albertofranzin commented 1 year ago

Hi,

yes, I guess the entire network is just too big. Maybe it's possible to refactor the code to make it more memory-efficient (even better, rewrite the internals in c), but that is way beyond the effort I can devote to the package now, and it probably wouldn't solve the issue in this case either.

The only solution I can think of is to break down the network into, say, blocks of 2 or 3 consecutive time slots, perform belief propagation in each chunk and manually "propagate" the observations.

Cheers,

Alberto

sambofra / bnstruct

Issues with memory in DBN em or Belief propagation inference #30