inducer / pytato

Lazily evaluated arrays in Python
Other
8 stars 16 forks source link

add get_hash, make DAGs deterministic #457

Open matthiasdiener opened 9 months ago

inducer commented 9 months ago

How does this relate to pytools.persistent_dict? (Which is applicable to pytato DAGs btw.)

matthiasdiener commented 9 months ago

How does this relate to pytools.persistent_dict? (Which is applicable to pytato DAGs btw.)

Hmm, I'm not sure - I could only imagine using a KeyBuilder here.

matthiasdiener commented 9 months ago

Some (discouraging) performance results regarding other dict implementations:

import immutables
import frozendict
import immutabledict
import timeit

d_init = {}

print("init_dict", timeit.timeit("for i in range(10000000): d_init[i] = i", globals=globals(), number=3))

d = dict(d_init)
print("dict", timeit.timeit(
    "for k, v in d.items(): s += 1", globals=globals(), number=3, setup="s=0"))

d = immutabledict.immutabledict(d_init)
print("immutabledict", timeit.timeit(
    "for k, v in d.items(): s += 1", globals=globals(), number=3, setup="s=0"))

d = frozendict.frozendict(d_init)
print("frozendict", timeit.timeit(
    "for k, v in d.items(): s += 1", globals=globals(), number=3, setup="s=0"))

d = immutables.Map(d_init)
print("immutables", timeit.timeit(
    "for k, v in d.items(): s += 1", globals=globals(), number=3, setup="s=0"))
init_dict 1.2013676660135388
dict 0.6082258749520406
immutabledict 2.560783916967921
frozendict 0.6182653750292957
immutables 2.350354666938074
Package Performance License Iteration order Frozen
dict :white_check_mark: :white_check_mark: :white_check_mark:
immutables.Map :white_check_mark: :white_check_mark:
frozendict :white_check_mark: :white_check_mark: :white_check_mark:
immutabledict :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
pyrsistent.map :white_check_mark: :white_check_mark:
inducer commented 9 months ago

This is surprising. immutabledict is a really simple animal. Given the implementation of iteration, it should perform no different than iteration over a dict:

https://github.com/corenting/immutabledict/blob/9684df33bdcd68c4a1678be349592dd8b0c6d940/immutabledict/__init__.py#L41-L42

Also, I'm not sure that iteration is the most salient thing to benchmark, I would imagine creation is likely more important.

Also, since most of our dicts are going to be quite small: If we model cost as $T(n)=\alpha n+\beta$, $\beta$ it might be interesting to find both $\alpha$ and $\beta$.

And: a cheap win might be had by simply feeding the single-file implementation of immutabledict to Cython.

matthiasdiener commented 9 months ago

This is surprising. immutabledict is a really simple animal. Given the implementation of iteration, it should perform no different than iteration over a dict:

https://github.com/corenting/immutabledict/blob/9684df33bdcd68c4a1678be349592dd8b0c6d940/immutabledict/__init__.py#L41-L42

Adding an explicit implementation for items() (instead of relying on super().items() I guess?) makes the package run as fast as dict. Same for keys() and values().

inducer commented 9 months ago

Oh cool. So we may have a winner?

inducer commented 9 months ago

(After a PR to that package, at least?)

matthiasdiener commented 9 months ago

Oh cool. So we may have a winner?

Yep, I'm working on a PR (https://github.com/corenting/immutabledict/pull/265).

matthiasdiener commented 9 months ago

We didn't finish the discussion in the morning - is there still value in this PR?

From what I can see, there are 3 pieces here:

inducer commented 9 months ago

Could you explain what get_hash would do, compared to hash or the persistent-dict hash procedure?

If the answer is "make the hash consistent" run-over-run, I'm not sure I like that idea.

matthiasdiener commented 9 months ago

Could you explain what get_hash would do, compared to hash or the persistent-dict hash procedure?

If the answer is "make the hash consistent" run-over-run, I'm not sure I like that idea.

Right, in that case there wouldn't really a difference between get_hash and hash, I think.

matthiasdiener commented 9 months ago

1bfc5a2 adds a PytatoKeyBuilder, based off LoopyKeyBuilder (and makes get_hash use it, just as a test). Is this going in the direction you had in mind @inducer ?