Memory management using large dd

tulip-control / dd

Binary Decision Diagrams (BDDs) in pure Python and Cython wrappers of CUDD, Sylvan, and BuDDy

https://pypi.org/project/dd

Other

182 stars 39 forks source link

Memory management using large dd #67

Closed glarange closed 4 years ago

glarange commented 4 years ago

Hi @johnyf , The recursive function you helped me write is working with some fairly large BDDs, up to 5K nodes. But use cases require even larger. The biggest constraint with the recursive function isn't speed but memory use. We included memorization already. What other solutions to optimize memory use could you suggest? Is it possible to use hard disk memory as virtual RAM, for example? Thanks, G

johnyf commented 4 years ago

I would suggest two changes:

storing integer identifiers in the memoization cache, instead of Function instances, as done in the function dd._copy._copy_bdd: https://github.com/tulip-control/dd/blob/24ec9eba335c173e1fe0367a9ac25ec201dee1b5/dd/_copy.py#L60
using disk memory via the module shelve, as done in the function dd._copy.dump_json: https://github.com/tulip-control/dd/blob/24ec9eba335c173e1fe0367a9ac25ec201dee1b5/dd/_copy.py#L125-L139 Please note that str(k) is needed, where k is the integer identifier of the BDD node, as demonstrated in the function dd._copy._dump_bdd: https://github.com/tulip-control/dd/blob/24ec9eba335c173e1fe0367a9ac25ec201dee1b5/dd/_copy.py#L167-L190 The need for string conversion is because shelve supports only strings as keys, quoting its documentation:

A “shelf” is a persistent, dictionary-like object. The difference with “dbm” databases is that the values (not the keys!) in a shelf can be essentially arbitrary Python objects — anything that the pickle module can handle. This includes most class instances, recursive data types, and objects containing lots of shared sub-objects. The keys are ordinary strings.

glarange commented 4 years ago

Is (1) above a prerequisite for (2) or are they independent? Here's how I would do (1). Can you give some guidance on how to implement (2) for the function below?

from dd.autoref import BDD

def recurse_prob(u, var_prob, mem):
    # constant node ?
    if u.var is None:
        return (0,0) if u.negated else (1,0)
    z = rectify(u)
    k = int(z)
    # memoized ?
    if k in mem: 
        (r,v) = mem[k]
        (r,v) = flip(r,v, u)
        return (r,v)
    # recurse
    p = recurse_prob(u.low, var_prob, mem)
    q = recurse_prob(u.high, var_prob, mem)
    r = p[0] + var_prob[u.var] * (q[0] - p[0])
    v=(1-var_prob[u.var])*failr[u.var]*(q[0]-p[0])+(1-var_prob[u.var])*p[1]+var_prob[u.var]*q[1]

    # memoize
    mem[k] = (r,v)
    (r,v) = flip(r,v, u)
    return (r,v)

def rectify(u):
    return ~ u if u.negated else u

def flip(r,v, u):
    if u.negated:
        r = 1 - r
        v = 1 - v
    return (r,v)

johnyf commented 4 years ago

I would say that item 2 above includes item 1, because if u is an instance of dd.cudd.Function (or dd.autoref.Function), then str(u) is of the form '@' followed by the digits of an integer, so the results of str(int(u)) and str(u) are almost the same (the difference is in the @ character).

The implementation above (https://github.com/tulip-control/dd/issues/67#issuecomment-733123560) of item 1 from above (https://github.com/tulip-control/dd/issues/67#issuecomment-731902452) looks good.

About implementing item 2 from above (https://github.com/tulip-control/dd/issues/67#issuecomment-731902452), at the calling site of function recurse_prob, the argument mem would be a shelve, instead of a dict. For example, similarly to the body of the function dd._copy.dump_json:

# Copyright 2016-2018 by California Institute of Technology
# All rights reserved. Licensed under 3-clause BSD.
#
import os
import shelve
import shutil

SHELVE_DIR = '__shelve__'

def call_recurse_prob(u, var_prob):
    tmp_fname = os.path.join(SHELVE_DIR, 'temporary_shelf')
    os.makedirs(SHELVE_DIR)
    try:
        cache = shelve.open(tmp_fname)
        r, v = recurse_prob(u, var_prob, cache)
        cache.close()
    finally:
        shutil.rmtree(SHELVE_DIR)
    return r, v

and in the body of the function recurse_prob a change to k = str(int(z)).

I would also recommend spreading long expressions over multiple lines, which is possible by using parentheses, for example:

x = (a1 + a2 + a3 + a4 + a5 + a6 + a7 + a8 + a9 + a10 +
    a11 + a12 + a13 + a14 + a15 + a16 + a17 + a18 + a19 + a20)

Parentheses allow implicit line continuation, without the need for an explicit line break \. I would recommend following PEP 8, for example 2 lines between functions, space after comma, space between infix operators.

glarange commented 4 years ago

Thank you for the suggestions. Yes, I can follow PEP8, including line breaks (Something I learnt today!)

Let me try the "shelving" now.

glarange commented 4 years ago

I tried the shelve cache as shown below. It produces same results as the prior version, but kernel still crashes when using more than 1000 nodes or so. I think I'm missing something because I still use the dictionary in the original function recurse_prob. Is the resurse_prob how it should be?

def recurse_prob(u, var_prob, mem):
    # constant node ?
    if u.var is None:
        return (0,0) if u.negated else (1,0)
    z = rectify(u)
    k = str(int(z))
    # memoized ?
    if k in mem: 
        (r,v) = mem[k]
        (r,v) = flip(r,v, u)
        return (r,v)
    # recurse
    p = recurse_prob(u.low, var_prob, mem)
    q = recurse_prob(u.high, var_prob, mem)
    r = p[0] + var_prob[u.var] * (q[0] - p[0])
    v=((1-var_prob[u.var])*failr[u.var]*(q[0]-p[0])+
       (1-var_prob[u.var])*p[1]+var_prob[u.var]*q[1])

    # memoize
    mem[k] = (r,v)
    (r,v) = flip(r,v, u)
    return (r,v)

def rectify(u):
    return ~ u if u.negated else u

def flip(r,v, u):
    if u.negated:
        r = 1 - r
        v = 1 - v
    return (r,v)

def call_recurse_prob(u, var_prob):
    tmp_fname = os.path.join(SHELVE_DIR, 'temporary_shelf')
    os.makedirs(SHELVE_DIR)
    try:
        cache = shelve.open(tmp_fname)
        r, v = recurse_prob(u, var_prob, cache)
        cache.close()
    finally:
        shutil.rmtree(SHELVE_DIR)
    return r, v

pBDD = call_recurse_prob(u, vars_prob)

johnyf commented 4 years ago

The implementation appears to be correct (the only remaining dict used is var_prob, which is the unchanged through the recursion--mem is now a shelve, i.e., a dictionary-like object on the disk). A del z could be added after the line k = str(int(z)) to reduce the Function instances that are present in the depth-first traversal.

Other points to consider when using dd.cudd are increasing the total memory for the manager, for example bdd = dd.cudd.BDD(3 * cudd.GB), and obtaining statistics using the method bdd.statistics. The module dd.cudd can work with millions of nodes in the BDD manager.

If using dd.autoref, then dynamic reordering is disabled by default, and can be enabled with bdd.configure(reordering=True) where bdd an instance of the class dd.autoref.BDD.

The size of a particular BDD (not the total nodes in the BDD manager), i.e., the number of nodes that are reachable from a given node, can be read from the attribute dag_size, for example print(u.dag_size) before calling the function call_recurse_prob.

Also, I do not know whether this code is run in a virtual machine, but if so it might have reduced resources compared to the actual machine.

glarange commented 4 years ago

Added del z. Kernel still crashes. I'm not using virtual machine.

Regarding dd.cudd, am unable to import. I get a ModuleNotFoundError. Should I open a separate issue for that?

Thanks, Gui

johnyf commented 4 years ago

Usage of dd with a large number of nodes is intended with dd.cudd. A ModuleNotFoundError about dd.cudd means that it is not installed. The file README.md contains information about how to install dd.cudd (https://github.com/tulip-control/dd#wheel-files-with-compiled-cudd and https://github.com/tulip-control/dd#dd-fetching-cudd). Recently dd == 0.5.6 has been released and the PyPI files include a manylinux_x86_64 wheel for Python 3.9 that contains the module dd.cudd (i.e., compiled). So on Linux and Python 3.9 a pip install dd is expected to install this wheel file and so dd.cudd too. On other Python versions and operating systems python setup.py install --fetch --cudd as described in https://github.com/tulip-control/dd#dd-fetching-cudd is expected to build and install dd.cudd (running this inside either the repository or an unpacked source distribution from PyPI).

Also, even if dd.cudd is installed, running python or ipython within the top directory of this repository, or of the source distribution of dd, and attempting import dd.cudd, will result in a ModuleNotFoundError, because Python will attempt to import dd from its source directory, instead of the dd that is installed under site-packages. So running python or ipython outside the source distribution of dd is needed. This observation could be useful immediately after completing an installation of dd, because trying to check that dd.cudd has been installed (e.g., by running python -c "import dd.cudd") would fail if running in the same directory (a change of directory is first needed, after completing the installation, and before checking that dd.cudd has been installed successfully).

When installing with python setup.py (instead of pip install) it may be useful to first run pip uninstall -y dd to remove any existing installation of dd.

glarange commented 4 years ago

OK, I will continue this thread on a separate issue, thanks.