Closed sean-reed closed 8 years ago
Thank you for the observation. The good property of xrange
is that it has an upper bound by definition.
The CPython limitation of xrange
cited above is documented here:
"CPython implementation detail: xrange()
is intended to be simple and fast. Implementations may impose restrictions to achieve this. The C implementation of Python restricts all arguments to native C longs (“short” Python integers), and also requires that the number of elements fit in a native C long. If a larger range is needed, an alternate version can be crafted using the itertools
module: islice(count(start, step), (stop-start+step-1+2*(step<0))//step)
."
xrange(sys.maxsize)
instantiates fine, whereas xrange(sys.maxsize + 1)
raises an OverflowError: Python int too large to convert to C long
. Is this the error that you get?
Can you please note the values of start, i, self.max_nodes, len(self._succ)
in the call of dd.bdd.BDD._next_free_int
that raises the exception?
Based on the description below, _succ
cannot become larger than sys.maxsize
, so xrange
is not expected to raise an OverflowError
. Another comment is that, on the machine I am writing from, sys.maxsize == 9223372036854775807
. I do not expect not even CUDD to work with this many nodes (it works with at most a few dozen million). So, something else may be happening, for example a memory error.
If it is a memory error due to insufficient space, then there are (at least) two solutions:
dd.bdd.reorder
(in the future, it will be called automatically #9)dd.cudd
, with CUDD installed (likely need to change only 1 line in user code, the import
statement).dd.bdd.BDD
For context, creation of new nodes in dd.bdd.BDD
works by maintaining in the attribute BDD._min_free
the smallest integer not assigned to some node. The invariant is that all integers < _min_free
are currently nodes, and _min_free
is not a node. This invariant is preserved as follows:
_min_free
initialized to 2_min_free
increased to a free integer returned by BDD._next_free_int
, in order to create a new node_min_free
reduced to the integer of a node being deleted, if that integer is smaller than _min_free
.So, the integers 1 .. _min_free - 1
are nodes, _min_free
is not a node, and there may exist nodes that are integers larger than _min_free
.
The method dd.bdd.BDD._next_free_int
starts at the integer _free_int
that has just been assigned to a node, and searches linearly for the smallest integer in _free_int .. BDD.max_nodes
that is not a node.
The upper limit BDD.max_nodes
is initialized to sys.maxsize
by default, but can be changed as desired by the user. The documentation of sys.maxsize
defines it as:
"The largest positive integer supported by the platform’s Py_ssize_t
type, and thus the maximum size lists, strings, dicts, and many other containers can have."
Since dd.bdd.BDD
stores the map from (integer) nodes to successors (low, high) in the dict
attribute BDD._succ
, based on the above it should be possible for _free_int
to become larger than sys.max_size
, because the invariant would imply that len(_succ) > sys.maxsize
which is impossible for a dict
in CPython (based on the docs quoted above).
dd.mdd.MDD
As a side note, in dd.mdd
I have implemented a more time efficient (with trade-off space) algorithm for tracking and assigning free nodes in MDD._allocate
. Instead of maintaining the minimal _free_int
, it stores the integers of deleted nodes in the set MDD._free
, and so no search is needed. When those are exhausted, MDD._max
is simply incremented by 1, because no integer > _max
has been allocated to any node yet.
That a set is maintained takes more space, but is not an issue, because integers are added to that set only after being removed from _succ
, which implies that this addition does not increase memory usage (in other words, those integers were already occupying memory as elements of _succ
, so moving them to a set does not increase memory usage). This should be faster. I would like to experiment with it in dd.bdd.BDD
at some point.
Thanks for the detailed response.
The issue arises when using CPython if sys.maxsize
is larger than the maximum positive integer supported by the platforms native C long. For example, on Windows 64 bit the Py_ssize_t
type is defined as an 8 byte integer whilst the native C long is only 4 bytes. I suspect you are using either 64 bit Linux or OS X where the native C long is 8 bytes so there would be no overflow.
As you say, even CUDD cannot cope with anywhere near sys.maxsize
nodes so one possible solution is to set default for self.max_nodes
to something that would fit in a 4 byte integer (e.g. 2**31 -1), another is to avoid using xrange
altogether:
def _next_free_int(self, start, debug=False):
"""Return the smallest unused integer larger than `start`."""
for i in itertools.count(start):
if i <= self.max_nodes and i not in self._succ:
if debug:
for j in xrange(1, start + 1):
assert j in self, j
return i
raise Exception('full: reached `self.max_nodes` nodes.')
Indeed, accurate guess: I am using 64 bit Linux and 64 bit OS X. I see now that the problem arises by a mismatch between Python's internally declared limitations on sizes and the system's limitations. Also, it doesn't arise because the number of nodes reaches anywhere near the limits, but just because the upper bound given to xrange
happens to be larger than the limit of the operating system. (One more rule where Windows is the exception...)
Even though I am willing to use a simple incrementation pattern, it is:
debug=True
, then the remaining xrange
will fail, for sufficiently large start
(which won't happen in practice, though theoretically it remains possible). This is why rule 2 here is important.Also, this difference between C long and Py_ssize_t
for the xrange
implementation is Cpython issue 26428, where it was rejected.
It turns out that my original implementation in Python 2 was correct, because it used sys.maxint
, which corresponds to C long and how xrange
is limited in Python 2. It as modified in 956e15def27ad374263f9c92ef7097d98aa8202b.
islice
could also be used, as shown here, but it is less readable and more complex (PEP 20).
Reverted in 0f98425fd91a67b8f5891a997a0e748b83d7565e to use sys.maxint
in Python 2 only, because that corresponds to how xrange
is implemented. So, the option "something that would fit in a 4 byte integer" was selected.
Excellent, thanks!
Arguments to the
xrange
function are limited to the size of a native C long when using CPython which can cause thebdd._next_free_int
method to throw an exception (e.g. whenself.max_nodes
is set to default ofsys.maxsize
). I suggest using a simple while loop instead with increment in each iteration until a free node is found oritertools.count
if generator is preferred.