tulip-control / dd

Binary Decision Diagrams (BDDs) in pure Python and Cython wrappers of CUDD, Sylvan, and BuDDy
https://pypi.org/project/dd
Other
181 stars 39 forks source link

Error with bdd.dump #10

Closed freb closed 8 years ago

freb commented 8 years ago

First, thank you for the excellent library.

I installed dd[dot] and I pasted in the example from https://github.com/johnyf/dd/blob/master/doc.md#plotting to try to get a pdf of my bdd. However, I get an error whenever I call dump for either a pdf or a pickle. I'm using version 0.3.0 but haven't tried any earlier versions.

...
/usr/local/lib/python3.4/dist-packages/dd/bdd.py in descendants(self, roots)
    238         Nodes are represented as positive integers.
    239         """
--> 240         for u in roots:
    241             assert u in self, u
    242         nodes = set()

TypeError: 'NoneType' object is not iterable

Any thoughts on this?

One additional question out of curiosity. I noticed in the docs you say "Some less frequently used BDD methods are...evaluate...". I was wondering why evaluate isn't used very often. Since BDDs are basically binary functions, wouldn't people want to evaluate them? I first noticed this in the pyeda library that is missing this feature completely.

johnyf commented 8 years ago

Thanks, good catch! Tested in 864be6adfa83e665205ded19ed0e1aa2feda23d4 and corrected in d25596fbcac6e150f0a6265eb1781994092db5a9. This error was caused due to propagating None as roots to descendants. Instead, roots should be all nodes in this case (and there is no reason for calling descendants). The error was introduced by rewriting dd.bdd.BDD.dump and associated functions to dump nodes from given roots, instead of all nodes. Earlier versions did not have a roots argument at all, and dd.bdd.BDD.descendants started from a single root, so this error was not present.

This change is motivated by the API that CUDD exposes, thus dd.cudd.BDD.dump. Currently, the two methods are still different. The use case that motivated this progress towards making the API of dd.bdd and dd.cudd identical (over common functionality) is this line. What happens there is that the transition relation of a synthesized implementation (i.e., a program) is dumped to a file. The BDDs in these applications can have millions of nodes (using dd.cudd, not dd.bdd), so we wouldn't want to dump the entire BDD manager.

Currently, the API of dd.bdd and dd.cudd is almost identical. The benefit of having an identical API is that the same Python code can run both in pure Python, for lightweight applications, as well as with CUDD installed, for industrial-sized problems. The differences are in areas that need development (e.g., dumping) and design decisions.

For example, dd.bdd dumps to pickle, dot, and graphics formats, whereas CUDD dumps to its own dddmp format, and I'd like to unify the available options. The dddmp format is not very practical, because storing extra information about variables and levels is unnecessarily complicated. Pickle is nice, but Python-specific. Graphics formats are for inspection, not for storage. So, it seems that json would be a good choice.

For small to medium BDDs, unifying things is relatively straightforward, because the details don't matter much. However, for large BDDs details matter, and serialization and loading has to happen on-the-fly, not in memory. In other words, json from the standard library is not expected to work. ijson and json-streamer appear to be possible candidate replacements, but benchmarks are needed first.

johnyf commented 8 years ago

The infrequent use of the method BDD.evaluate is due to the axiomatic, instead of constructive approach that symbolic algorithms usually implement.

Suppose f is a Boolean function of the variables x, y, z. We can evaluate f for the assignment a = dict(x=False, y=True, z=False). To be doing so indicates that we are thinking and working in terms of the assignment a. If we want to reason about all possible assignments to those three variables, then we would have to feed f with each one of the assignments, and see what value f takes. This approach is enumerative, in that we enumerated all assignments of interest, while reasoning about f.

It is possible to write an enumerative algorithm that uses BDDs as data structures, but there is little to be gained, compared to the same enumerative algorithm using a, say, graph data structure enumerated in memory. Usually, BDDs are used in symbolic algorithms.

A symbolic algorithm manipulates sets, instead of individual assignments. So, we never really have to evaluate the function f for some specific assignment, because we solve the problem for all assignments collectively, without ever picking out any one those assignments.

At this point, one may remark that all this sounds good, but at the end of the day, we want to use the result that the algorithm produced as a BDD. This result can have 2 forms:

  1. implicit or
  2. explicit.

The implicit representation is a BDD f(x, y, z), where the assignments that do the job we want are those for which f is True. To iterate over such assignments, call dd.bdd.BDD.sat_iter. This is the usual way that I use BDDs. Note that this representation does not distinguish between input and output variables. So, if you were given the value of x as input, and wanted to find the outputs y, z, then first you need to cofactor f with the given value of x, then sat_iter over the result (assigning values to y, z).

The explicit representation is a collection of BDDs, one for each output variable. This does distinguish between inputs and outputs. In this case, we would have f_y(x) and f_z(x) as separate BDDs, each dependent on variable x. It is in this case that we would use evaluate. Usually, at this stage one would generate a circuit or program code that computes these two BDDs, i.e., "hardcodes" evalutate for these two particular BDDs, so that it can be deployed as hardware or software.

An example of a symbolic algorithm that solves games to produce programs (as BDDs) from logical formulae is omega.games.gr1.

freb commented 8 years ago

Thank you for the detailed explanation. In addition to squashing the bug and answering my question, you also pointed me toward the cofactor function which I'll be making use of at some point.

In my project I'm modeling a firewall rule set with a BDD to perform rule analysis. In this case, after the BDD representation has been constructed its natural to want to construct a packet (i.e. assign a value to all applicable variables) and then see what the function evaluates to. This wasn't my motivation for working with BDDs but it has been good check to make sure my BDD implementation was working as expected.

Since the cofactor and evaluate functions aren't available with CUDD yet I'll need to stick with pure Python for at least these methods. I start on the rule analysis portion of my project today which should not require these. I haven't used CUDD yet but hope to once the next portion of my program is complete.

I can confirm that dump is now working as expected.

Thanks!

johnyf commented 8 years ago

Thanks for the feedback. Please note that dd.cudd.BDD.cofactor exists, though dd.cudd.BDD.evaluate does not. But dd.cudd.BDD.cofactor serves as dd.cudd.BDD.evaluate, by passing an assignment of values to variables as a dict, as described in the docstring of dd.cudd.BDD.cube.

greping the sources of CUDD suggests that it does not implement an "evaluate" function, I presume because cofactor suffices.