Unify pairs and primitive algebra internals

GoogleCodeExporter commented 9 years ago


Pairs algebra class saves the information about the instance
in head and data attributes.
Primitive algebra class saves the information in tree attribute
as is supposed to be a 2-tuple in the form (head, operands).
There is 1-1 correspondence between these representations
and there is little reason to use different representations
of basically the same information - this only increases the
complexity of sympycore without having a real gain in performance.

My suggestion is to use the same representation in primitive algebra
that pairs use, that is, the primitive algebra will save the
information in head and data attributes.

However, if we can prove that holding the head and data information
in a single 2-tuple attribute tree is more efficient then we
should use it also in pairs as the pairs algebra (eg Calculus)
must be fast.

Original issue reported on code.google.com by pearu.peterson on 1 Mar 2008 at 4:36

GoogleCodeExporter commented 9 years ago

Here follow some tests comparing different ways to hold (head, data)
information:

class Pairs(object):
    def __new__(cls, data, head, new=object.__new__):
        obj = new(cls)
        obj.head = head
        obj.data = data
        return obj

class Primitive(object):
    def __new__(cls, data, head, new=object.__new__):
        obj = new(cls)
        obj.tree = (head, data)
        return obj

class Tuple1(tuple):
     def __new__(cls, data, head, new = tuple.__new__):
        obj = new(cls, (head, data))
        return obj

class Tuple2(tuple):
     pass

>>> %timeit Pairs((1,2),'+')
1000000 loops, best of 3: 958 ns per loop
>>> %timeit Primitive((1,2),'+')
1000000 loops, best of 3: 834 ns per loop
>>> %timeit Tuple1((1,2),'+')
1000000 loops, best of 3: 713 ns per loop
>>> %timeit Tuple2(((1,2),'+'))
1000000 loops, best of 3: 275 ns per loop

Fredrik, do you see any reasons why we should not derive Pairs and Primitive
classes directly from tuple? Sure, constructing Tuple2 requires extra 
parenthesis
but already now calling constructors is needed only internally and 4x speedup
in constructing an object may have significant impact to overall performance.

I'll test also the performance of accessing attributes for these cases..

Original comment by pearu.peterson on 1 Mar 2008 at 5:01

GoogleCodeExporter commented 9 years ago

Ah, right, deriving from tuple would break numpy support. Unless we
can "break" sequence protocol for numpy.array constructor.

Original comment by pearu.peterson on 1 Mar 2008 at 5:06

GoogleCodeExporter commented 9 years ago

I have created a benchmark that times the creation of objects as well as
accessing data from created objects. The code is in 
research/test_object_creation.py,
here follow the results:

Pairs((1,2),"+"): 5.85 stones
Primitive((1,2),"+"): 5.12 stones
Tuple1((1,2),"+"): 4.68 stones
Tuple2(((1,2),"+")): 1.74 stones
pairs.data: 0.73 stones
primitive.data: 2.80 stones
primitive.tree[1]: 0.98 stones
tuple1.data: 2.03 stones
tuple1[1]: 0.45 stones
tuple2.data: 2.03 stones
tuple2[0]: 0.45 stones

So, though creating Pairs is more expensive, it's data access is cheaper.
Using property is slow as we already know..
The above results were obtained when Pairs and Primitive do not use slots.
With slots we have the following results:

Pairs((1,2),"+"): 5.12 stones
Primitive((1,2),"+"): 4.48 stones
Tuple1((1,2),"+"): 5.00 stones
Tuple2(((1,2),"+")): 1.77 stones
pairs.data: 0.57 stones
primitive.data: 2.67 stones
primitive.tree[1]: 0.82 stones
tuple1.data: 2.08 stones
tuple1[1]: 0.48 stones
tuple2.data: 2.16 stones
tuple2[0]: 0.48 stones

So, using slots improve both object creation as well as data access.

Original comment by pearu.peterson on 1 Mar 2008 at 5:55

GoogleCodeExporter commented 9 years ago

Added test for classical python class:

Pairs((1,2),"+"): 4.58 stones
Classical((1,2),"+"): 4.45 stones
Primitive((1,2),"+"): 4.07 stones
Tuple1((1,2),"+"): 4.51 stones
Tuple2(((1,2),"+")): 1.71 stones
pairs.data: 0.54 stones
classical.data: 0.50 stones
primitive.data: 2.47 stones
primitive.tree[1]: 0.74 stones
tuple1.data: 1.96 stones
tuple1[1]: 0.43 stones
tuple2.data: 1.92 stones
tuple2[0]: 0.42 stones

Original comment by pearu.peterson on 1 Mar 2008 at 6:13

GoogleCodeExporter commented 9 years ago

> Ah, right, deriving from tuple would break numpy support.

Yes, and probably a lot of user code as well (any code that uses isinstance(arg,
tuple) to check for a multi-argument).

Original comment by fredrik....@gmail.com on 2 Mar 2008 at 1:01

GoogleCodeExporter commented 9 years ago

However... I think it's worth trying.

(Note that "head, data = tpl" is faster than "head = tpl[0]; data = tpl[1]".)

While symbolic expressions represent "scalar" mathematical objects, they do 
carry a
notion of structure, so there is nothing semantically wrong about inheriting 
from a
container type (tuple).

We'd lose the Calculus(number) or Calculus(string) constructors, but that is no
serious loss since one can always introduce a function to replace those uses.

Original comment by fredrik....@gmail.com on 2 Mar 2008 at 1:11

GoogleCodeExporter commented 9 years ago

The ideas from this issue are implemented in svn.

Original comment by pearu.peterson on 13 Mar 2008 at 3:07

Changed state: Fixed

pearu / sympycore

Unify pairs and primitive algebra internals #55