Empty lists while creating parents

roed314 commented 10 years ago

This is crazy:

sage: import gc
sage: A = at_least
sage: pre = gc.get_objects()
sage: K = Qp(7,2)
sage: post = gc.get_objects()
sage: len(post) - len(pre)
747

Much of that is empty lists:

sage: M = []
sage: for a in post:
....:     for b in pre:
....:         if a is b:
....:             break
....:     else:
....:         M.append(a)
sage: from collections import defaultdict
sage: types = defaultdict(int)
sage: for a in K.M:
....:     types[type(a)] += 1
sage: print '\n'.join([str(a) + ' ' + str(types[a]) for a in types.keys() if types[a] > 1])
<type 'list'> 554
<type 'tuple'> 57
<class 'weakref.KeyedRef'> 28
<type 'weakref'> 22
<type 'dict'> 18
<type 'sage.structure.coerce_dict.MonoDictEraser'> 10
<type 'sage.structure.coerce_dict.MonoDict'> 10
<type 'cell'> 8
<type 'instancemethod'> 6
<type 'frame'> 6
<type 'sage.structure.coerce_dict.TripleDict'> 5
<type 'sage.structure.coerce_dict.TripleDictEraser'> 5
<type 'function'> 4
<type 'staticmethod'> 4
<class 'sage.structure.dynamic_class.DynamicMetaclass'> 4
<class 'sage.rings.homset.RingHomset_generic_with_category'> 2
<class '_ast.Name'> 2

sage: listlens = defaultdict(int)
sage: for a in M:
....:     if isinstance(a, list):
....:         listlens[len(a)] += 1
sage: sage: sorted(listlens.items())
[(0, 525), (1, 9), (2, 2), (3, 2), (23, 10), (53, 5), (116583, 1)]

I'm not sure where these 525 empty lists are created, but it seems a bit excessive.

It's not just p-adic fields:

sage: pre = gc.get_objects()
sage: K = QuadraticField(-11)
sage: post = gc.get_objects()
sage: len(post) - len(pre)
1152

CC: @nbruin @simon-king-jena @vbraun

Component: coercion

Author: Nils Bruin

Branch/Commit: u/nbruin/ticket/15367 @ 719cdec

Reviewer: Simon King

Issue created by migration from https://trac.sagemath.org/ticket/15367

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. Recent commits:
b8d5a5a	Remove unused debug method CoercionModel_cache_maps.get_stats()
8144996	trac #15367: Improved implementation of TripleDict and MonoDict, based on open addressing. This implementation should be faster for all operations and have a smaller memory footprint.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Changed commit from 9022b32 to b8d5a5a

nbruin commented 10 years ago

comment:41

Reverted to an implementation that uses an eraser class that keeps a weak reference to dictionary it erases on, rather than a closure for an eraser, which would keep a strong reference. Rationale: There are some adverse effects to cyclic reference cycles: For instance, if a GC clean-up has a side-effect (due to a callback for instance) of making a cycle unreachable, that cycle will only be picked up during the next GC. Non-cyclic garbage, on the other hand, would have refcounts hitting 0 and would get cleaned up immediately. So, if one can easily avoid cycles (and we can here), then it is better to do so.

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Branch pushed to git repo; I updated commit sha1. New commits:

[9aeb4f5](https://github.com/sagemath/sagetrac-mirror/commit/9aeb4f5) trac #15367: fix whitespace

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Changed commit from b8d5a5a to 9aeb4f5

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Changed commit from 9aeb4f5 to 719cdec

7ed8c4ca-6d56-4ae9-953a-41e42b4ed313 commented 10 years ago

Branch pushed to git repo; I updated commit sha1. This was a forced push. Recent commits:
719cdec	Remove unused debug method CoercionModel_cache_maps.get_stats()
0550ecb	trac #15367: Improved implementation of TripleDict and MonoDict, based on open addressing. This implementation should be faster for all operations and have a smaller memory footprint.

nbruin commented 10 years ago

comment:44

ping

I think this straightforward replacement is worth merging:

memory layout is more straightforward: If you want to trace a memory leak/reference chain, you immediately get to the MonoDict/TripleDict rather than having to dig through 2 levels of lists
memory footprint is much more compact
performance should be quite a bit better (although it seems that these dicts rarely get accessed in tight loops: optimizing such a loop usually removes the access to these dicts)
the way items get deleted makes sure that the Python trashcan is involved, which prevents recursion depth errors from occurring when deleting long chains of objects.

So, if someone can finish the review ... It's pretty straightforward code (apart from being rather performance sensitive).

simon-king-jena commented 10 years ago

comment:45

Once more, my impression is that the git workflow helps us solve problems that we didn't have without git.

That's to say: sage --dev checkout --ticket=15367 only brought me to my local branch for this ticket. sage --dev pull resulted in

Merging the remote branch "u/nbruin/ticket/15367" into the local branch "ticket/15367".
Automatic merge failed, there are conflicting commits.

Auto-merging src/sage/structure/coerce_dict.pyx
CONFLICT (content): Merge conflict in src/sage/structure/coerce_dict.pyx

Please edit the affected files to resolve the conflicts. When you are finished, your resolution will be commited.
Finished? [ok/Abort] A

So, how can I get your code?

vbraun commented 10 years ago

comment:46

Nils rewrote history with the forced pushs (bad), so some of the code in Simon's branch conflicts as it is on a different history.

Replying to @simon-king-jena:

Once more, my impression is that the git workflow helps us solve problems that we didn't have without git

Noticing conflicts is not a problem when you manually copy patches, correct. However, not noticing conflicts is a problem ;-)

So, how can I get your code?

If you haven't made any commits that you want to keep: delete your current branch (git branch -D my_branch) and checkout a fresh copy. If you have made commits, you have to rebase (git rebase or git cherrypick, for example) them on top of Nils' rewritten history.

simon-king-jena commented 10 years ago

comment:47

Replying to @vbraun:

Replying to @simon-king-jena:

Once more, my impression is that the git workflow helps us solve problems that we didn't have without git

Noticing conflicts is not a problem when you manually copy patches, correct. However, not noticing conflicts is a problem ;-)

As far as I know, my local repository contains nothing that depends on this ticket. So, how can there be a conflict? I am not talking about conflicts that are just artefacts of how git works.

So, how can I get your code?

If you haven't made any commits that you want to keep: delete your current branch (git branch -D my_branch) and checkout a fresh copy. If you have made commits, you have to rebase (git rebase or git cherrypick, for example) them on top of Nils' rewritten history.

If I have a local branch associated with this ticket, and if I pull from this ticket and there are merge conflicts with my local branch, I'd expect that my old local branch (ticket/15367) be preserved and a new local branch (say, ticket/15367-1) be created that is equal to the branch forced-pushed by Nils. This should be automatically done by either the dev script or by git. And then, I can still decide whether I want to delete my old branch or merge the new and the old branch, or I can explain to Nils why I don't like his new branch and switch back to the old one.

In contrast, you say that in the new workflow I have to decide whether or not to delete my old branch, before I pull from Nils' changes. Why? This approach seems plain wrong to me.

I guess this is a theoretical discussion that belongs to the sage-git list. Since I don't see stuff depend on this ticket (yet), I did as you suggested.

nbruin commented 10 years ago

comment:48

It seems that sage --dev has a concept of linking a branch with a trac ticket, which is convenient. In your case, you already had a branch linked with this ticket, so naturally it tried to merge my branch with your local one and due to the rebase there really WAS a conflict. Probably there is a way to "unlink" your local branch from the ticket, so that sage --dev checkout --ticket=15367 gets you a fresh branch. Similarly I'd hope there is a way to locally "reassociate" a branch with a ticket.

Sorry about the rebase, but in this case it seemed warranted because

nothing else depended on the branch
There was a big back-and-forth on removing and reintroducing the Eraser classes, which would have been silly to archive in the sage history (tripling the modification size, or something like that)

Keeping the trivial edits to CoercionModel_cache_maps.get_stats() attributed to Simon was an interesting exercise, though.

vbraun commented 10 years ago

comment:49

Replying to @simon-king-jena:

If I have a local branch associated with this ticket, and if I pull from this ticket and there are merge conflicts with my local branch, I'd expect that my old local branch (ticket/15367) be preserved and a new local branch (say, ticket/15367-1) be created that is equal to the branch forced-pushed by Nils.

The commits from all of those branches are of course preserved in git. If you like another local branch to give a name to the different histories then that is fine, but don't force your naming convention on others. To uniquely reference any of the past branch heads you only need the commit SHA1, whose historical values is conveniently listed in the ticket comments.

nbruin commented 10 years ago

comment:50

Replying to @simon-king-jena:

If I have a local branch associated with this ticket, and if I pull from this ticket and there are merge conflicts with my local branch, I'd expect that my old local branch (ticket/15367) be preserved and a new local branch (say, ticket/15367-1) be created that is equal to the branch forced-pushed by Nils.

I think you can "unlink" by doing something along the lines of ./sage --dev set-remote None no/remote/branch. Because your local branch is now associated with a different remote branch (is this a git feature or is "associated remote branch" something that sage-dev invents?) I would expect that checking out the ticket would now cause a new, fresh branch to be created.

simon-king-jena commented 10 years ago

comment:51

Replying to @nbruin:

It seems that sage --dev has a concept of linking a branch with a trac ticket, which is convenient. In your case, you already had a branch linked with this ticket, so naturally it tried to merge my branch with your local one and due to the rebase there really WAS a conflict.

AFAIK, git knows that this was a forced push. So, the dev script should at least offer a "forced pull".

Anyway, this ticket is not about annoyances and hold-ups caused by the switch to the new workflow.

simon-king-jena commented 10 years ago

comment:52

To start reviewing, I merged master (freshly pulled from trac) into the branch, which succeeded.

simon-king-jena commented 10 years ago

comment:53

Second step of the review: I checked that the patchbot is right and all tests pass (after merging master).

simon-king-jena commented 10 years ago

comment:54

Third step: Re-test timings. I do "branch versus sage-5.13.b4".

%cpaste
cython("""
def test(T, list L1, list L2, list L3, v):
    for i in L1:
        for j in L2:
            for k in L3:
                T[i,j,k] = v
def test2(T, list L1, list L2, list L3):
    for i in L1:
        for j in L2:
            for k in L3:
                v = T[i,j,k]
def test3(T, list L1, list L2, list L3):
    for i in L1:
        for j in L2:
            for k in L3:
                (i,j,k) in T
def test4(T, list L1, list L2, list L3):
    for i in L1:
        for j in L2:
            for k in L3:
                (i,j+1,k) in T
def test5(T, list L1, list L2, list L3):
    for i in L1:
        for j in L2:
            for k in L3:
                del T[i,j,k]
""")

In the branch:

sage: T = sage.structure.coerce_dict.TripleDict(11)
sage: L1 = srange(100)
sage: %time test(T,L1,L1,L1,ZZ)
CPU times: user 9.70 s, sys: 0.05 s, total: 9.75 s
Wall time: 9.77 s
sage: %time test2(T,L1,L1,L1)
CPU times: user 0.64 s, sys: 0.01 s, total: 0.65 s
Wall time: 0.66 s
sage: %time test3(T,L1,L1,L1)
CPU times: user 0.61 s, sys: 0.00 s, total: 0.61 s
Wall time: 0.61 s
sage: %time test4(T,L1,L1,L1)
CPU times: user 0.23 s, sys: 0.00 s, total: 0.23 s
Wall time: 0.23 s
sage: %time test5(T,L1,L1,L1)
CPU times: user 0.84 s, sys: 0.01 s, total: 0.85 s
Wall time: 0.86 s
sage: get_memory_usage()
231.640625

In sage-5.13.b4 (plus #14912, but this doesn't touch coerce dict):

sage: T = sage.structure.coerce_dict.TripleDict(11)
sage: L1 = srange(100)
sage: %time test(T,L1,L1,L1,ZZ)
CPU times: user 15.94 s, sys: 0.11 s, total: 16.05 s
Wall time: 16.08 s
sage: %time test2(T,L1,L1,L1)
CPU times: user 1.86 s, sys: 0.00 s, total: 1.86 s
Wall time: 1.87 s
sage: %time test3(T,L1,L1,L1)
CPU times: user 1.86 s, sys: 0.00 s, total: 1.86 s
Wall time: 1.86 s
sage: %time test4(T,L1,L1,L1)
CPU times: user 1.79 s, sys: 0.02 s, total: 1.81 s
Wall time: 1.82 s
sage: %time test5(T,L1,L1,L1)
CPU times: user 1.72 s, sys: 0.03 s, total: 1.75 s
Wall time: 1.75 s
sage: get_memory_usage()
318.078125

So, everything became faster, and the memory consumption is a bit less (but the latter could very well be because of other tickets).

The effect on startuptime is probably not significant, but I get (one run only) 1586.843ms with the branch versus 1600.406ms with 5.13.b4+#14912.

So, the winner is: THIS BRANCH.

Last step of the review: Read the new code.

simon-king-jena commented 10 years ago

comment:55

In MonoDict.set, there is

        cursor = self.lookup(<PyObject*><void*>k)
        if cursor.key_id == NULL or cursor.key_id == dummy:
            ...

without saying that cursor is cdef mono_cell*. Similar things happen in MonoDict.iteritems, Monodict_traverse and MonoDict_clear, and also in TripleDict.set, TripleDict.iteritems, TripleDict_traverse and TripleDict_clear.

First question: Why does it not crash? Second question, even if there is a good answer to the first: Shouldn't it be cdef mono_cell* cursor resp. cdef triple_cell* cursor in order to improve performance?

nbruin commented 10 years ago

comment:56

Replying to @simon-king-jena:

First question: Why does it not crash?

Cython has grown type inference.

Second question, even if there is a good answer to the first: Shouldn't it be cdef mono_cell* cursor resp. cdef triple_cell* cursor in order to improve performance?

No, because cython infers the correct type by itself.

I'm not commenting on whether it's good style to rely on said type inference, but it is very convenient when you're programming.

simon-king-jena commented 10 years ago

comment:57

Replying to @nbruin:

Replying to @simon-king-jena:

First question: Why does it not crash?

Cython has grown type inference.

Since when?

Amazing.

Anyway, back to needs review then...

nbruin commented 10 years ago

comment:58

Replying to @simon-king-jena:

Since when?

According to https://github.com/cython/cython/wiki/ReleaseNotes-0.13 since August 25, 2010 :-).

simon-king-jena commented 10 years ago

comment:59

Replying to @nbruin:

Replying to @simon-king-jena:

Since when?

According to https://github.com/cython/cython/wiki/ReleaseNotes-0.13 since August 25, 2010 :-).

So far I have always tried to tell Cython which type to expect.

Anyway.

The code looks good to me, tests pass, the patchbot has nothing to complain, and as we have seen there is an improved performance. Positive review!

jdemeyer commented 6 years ago

comment:61

While looking at this code, I noticed that the size argument is never used anywhere (except to interpret it as data):

def __init__(self, size=11, data=None, threshold=0.7, weak_values=False)

Is this intentional? If yes, can we deprecate size? If no, what should we do?

sagemath / sage

Empty lists while creating parents #15367