Open nbruin opened 12 years ago
Reviewer: Simon King
Hooray!
All tests passed!
Total time for all tests: 1128.9 seconds
And I did not notice any "ignored" errors.
I think I can give Nils' part of the patches a positive review, and suggest a cross-review.
Attachment: trac_13447-refcount_cleanup.patch.gz
clean up final bits in refcounting of singular rings
Description changed:
---
+++
@@ -9,5 +9,6 @@
* [attachment: trac_13447-consolidated_refcount.patch](https://github.com/sagemath/sage-prod/files/10656328/trac_13447-consolidated_refcount.patch.gz)
* [attachment: trac_13447-modulus_fix.patch](https://github.com/sagemath/sage-prod/files/10656329/trac_13447-modulus_fix.patch.gz)
* [attachment: trac_13447-rely_on_singular_refcount.patch](https://github.com/sagemath/sage-prod/files/10656330/trac_13447-rely_on_singular_refcount.patch.gz)
+* [attachment: trac_13447-refcount_cleanup.patch](https://github.com/sagemath/sage-prod/files/10656331/trac_13447-refcount_cleanup.patch.gz)
**Merge together with** #715, #11521
Patches look good to me, so a positive review from me for Simon's patches.
I've attached a further patch to clean up some more details. Doctests pass for me, but Simon should look at it and decide if he's happy with it. Please go ahead and amend as you see necessary. I also have no problem with the patches being unified (we have some very small patches here now)
Apply trac_13447-consolidated_refcount.patch trac_13447-modulus_fix.patch trac_13447-rely_on_singular_refcount.patch trac_13447-refcount_cleanup.patch
Concerning the last patch: I think one should rather use (as it is "officially" recommended in the docs) singular_ring_reference
and singular_ring_delete
, and not
if currRingHdl.data.uring!= currRing:
currRingHdl.data.uring.ref -= 1
if currRingHdl.data.uring.ref == 0:
rDelete(currRingHdl.data.uring)
currRingHdl.data.uring = currRing # ref counting?
currRingHdl.data.uring.ref += 1
Apart from that, I agree that the ring wrappers can be binned.
A bit later today, I will produce a combined patch, and provided that all tests pass I will set it to positive review.
For the record: Apparently it is impossible to use the "proper" singular_ring_delete function in the code snipped mentioned in comment:44. I guess the problem is that singular_ring_delete will set currRing=None
in certain situations - hence, the next line currRingHdl.data.uring = singular_ring_reference(currRing)
will fail with a segfault.
So, I'll leave this code snipped as it is.
On the other hand: I'd really like to understand why it doesn't work with singular_ring_delete
in this case.
I found that the segfault occurs inside singular_ring_delete(currRingHdl.data.uring)
when attempting to change to currRingHdl.data.uring
. So, apparently, currRingHdl.data.uring
is invalid at that point. And this is hardly surprising, because it is created by the following lines, which look like a hack to me:
if currRingHdl == NULL:
import sys
currRingHdl = enterid("my_awesome_sage_ring", 0, RING_CMD, &IDROOT, 1)
currRingHdl.data.uring.ref += 1
Now I think I understand:
currRingHdl = enterid("my_awesome_sage_ring", 0, RING_CMD, &IDROOT, 1)
has the purpose to create currRingHdl
(it is only called if currRingHdl==NULL
), but the value assigned to currRingHdl.data.uring
is invalid. In particular, the attempt to change to currRingHdl.data.uring
results in a segfault. Hence, singular_ring_delete(currRingHdl.data.uring)
would not work.
Solution: We create currRingHdl
, but we immediately delete the invalid currRingHdl.data.uring
and replace it by a new reference to currRing
. If there is no currRing
then I print a warning, but I hope that we will never see that warning...
I have attached a combined patch. Nils, are you happy with what I did with currRingHdl
in sage/libs/singular/function.pyx? If you are, and if the tests pass, then please set it to "positive review"
For the record: I have
$ hg qa
trac_715_combined.patch
trac_715_local_refcache.patch
trac_715_safer.patch
trac_715_specification.patch
trac_11521_homset_weakcache_combined.patch
trac_11521_callback.patch
13145.patch
trac_13447-sanitise_ring_refcount.patch
on top of sage-5.4.beta0 Edit and the tests both in sage/rings/polynomial and sage/libs/singular pass. Let's hope for the full test suite!
Apply trac_13447-sanitise_ring_refcount.patch
Description changed:
---
+++
@@ -4,11 +4,6 @@
**Apply**
-#13145, #715, #11521, and then
-
-* [attachment: trac_13447-consolidated_refcount.patch](https://github.com/sagemath/sage-prod/files/10656328/trac_13447-consolidated_refcount.patch.gz)
-* [attachment: trac_13447-modulus_fix.patch](https://github.com/sagemath/sage-prod/files/10656329/trac_13447-modulus_fix.patch.gz)
-* [attachment: trac_13447-rely_on_singular_refcount.patch](https://github.com/sagemath/sage-prod/files/10656330/trac_13447-rely_on_singular_refcount.patch.gz)
-* [attachment: trac_13447-refcount_cleanup.patch](https://github.com/sagemath/sage-prod/files/10656331/trac_13447-refcount_cleanup.patch.gz)
+#13145, #715, #11521, and then [attachment:trac_13447-sanitise_ring_refcount.patch
**Merge together with** #715, #11521
make ptest
succeeded on bsd.math! To be on the safe side, I am now testing make ptestlong
, but if you (Nils) is happy with my changes to currRingHdl
then please finish the review!
And make ptestlong
succeeded as well. Looks good, I think.
I'm not so sure that
currRingHdl = enterid("my_awesome_sage_ring", 0, RING_CMD, &IDROOT, 1)
currRingHdl.data.uring.ref += 1
is entirely safe. In 'libsing-test2.cc' (in the Singular package, the following similar code is used:
// prepare the arguments
// create a ring Q[x,y]
// the variable names
char **n=(char**)omAlloc(2*sizeof(char*));
n[0]=(char*)"x";
n[1]=(char*)"y";
// create the ring
ring R=rDefault(0,2,n);
// n is not needed any more:
omFreeSize(n,2*sizeof(char*));
// make it the default ring, also for the interpeter
idhdl newRingHdl=enterid("R" /* ring name*/,
0, /*nesting level, 0=global*/
RING_CMD,
&IDROOT,
FALSE);
IDRING(newRingHdl)=R;
rSetHdl(newRingHdl);
I'd assume IDRING(newRingHdl)
is newRingHdl.data.uring
, so the assignment above does not assume anything about that field being initialized. So who knows where newRingHdl.data.uring.ref
points!
I get the impression that indeed, currRing
is set to something valid nearly always, so your assumption that it is during SingularFunction.__init__
(that's earlier than calling!) might be OK. The singular code does contain an awful lot of currRing!=NULL
checks though, so I'm not so sure that's really guaranteed.
OK, a little searching of the Singular source shows that enterid
(which takes 6 arguments but we call it with 5 and so does the above example) apparently calls idrec::set
, in our case with init=1
and the example above with init=0
. The init=1
causes a call idrecDataInit
and for a ring command that does omAlloc0Bin(sip_sring_bin)
. The resulting pointer gets indeed assigned to what I think is ...data.uring
(although not in a very typerespecting way, but I guess that's par for C code).
So my guess is that by calling with init=1
we indeed get a block that we can delete. Since that's all we do with it I think the cleaner way for us would be
currRingHdl = enterid("my_awesome_sage_ring", 0, RING_CMD, &IDROOT, 0)
currRingHdl.data.uring = NULL
(or whatever is the appropriate value for uring), as per the libsing example above)
if currRingHdl.data.uring!= currRing:
if currRingHdl.data.uring != NULL:
singular_ring_delete(currRingHdl.data.uring)
currRingHdl.data.uring = currRing
singular_ring_reference(currRing)
(or indeed just leave init=1 in place so that currRingHdl.data.uring
is never NULL. In any case, this happens at most once in any sage session, so the leaked memory (if any) would be minuscule.
Concerning refcounts: I found the following in dyn_modules/python/ring_wrap.h
#include <boost/intrusive_ptr.hpp>
using namespace boost;
//typedef intrusive_ptr<ip_sring> Ring;
// inline void intrusive_ptr_add_ref(ring r){
// r->ref++;
// }
// inline void intrusive_ptr_release(ring r){
// r->ref--;
// if (r->ref<=0) rDelete(r);
// }
which is in step with our use of the ref field. But the code is commented out! There is such a thing as an intrusive_ptr
in Boost and I think it needs exactly those two functions declared to function. If this is what is intended then we're OK. If this is commented out because now a different scheme is used, we may be in trouble.
EDIT: kernel/Number.h
contains uncommented definitions:
using namespace boost;
inline void intrusive_ptr_add_ref(ring r){
r->ref++;
//Print("ref count after add: %d", r->ref);
}
inline void intrusive_ptr_release(ring r){
if (r->ref<=0) rDelete(r);
else {
r->ref--;
}
//Print("ref count after release: %d", r->ref);
}
I find the ptr_release
a little worrisome: Do the singular people believe that r->ref ==0
means that there is still a reference? Do they really want to count references "-1"-based? I guess one can count it as "additional" references. I'm afraid they might. From
Singular/ipshell.cc
:
void rKill(ring r)
{
if ((r->ref<=0)&&(r->order!=NULL))
{
...
rDelete(r);
return;
}
r->ref--;
}
void rKill(idhdl h)
{
ring r = IDRING(h);
int ref=0;
if (r!=NULL)
{
ref=r->ref;
rKill(r);
}
if (h==currRingHdl)
{
if (ref<=0) { currRing=NULL; currRingHdl=NULL;}
else
{
currRingHdl=rFindHdl(r,currRingHdl,NULL);
}
}
}
this code all indicates that special action is required only if ref<=0
'before' decreasing. Can you check with singular people if that's the correct usage of the ref
field? If we use it in a different way I image nasty bugs could arise later (our use would prevent singular from ever deleting a ring we've had our hands on (if it would ever do that) and if we get a ring that singular initially constructed, we could delete it prematurely, due to an off-by-one.
In that case we should probably ensure that singular_ring_ref
and singular_delete_ring
are somehow aliased to boost::intrusive_ptr_add_ref
and intrusive_ptr_release
from kernel/Number.h
.
libSingular experts: Does ring->ref == 0
mean there is exactly one reference to the ring still active?
Replying to @nbruin:
I'm not so sure that
currRingHdl = enterid("my_awesome_sage_ring", 0, RING_CMD, &IDROOT, 1) currRingHdl.data.uring.ref += 1
is entirely safe.
I am sure that it is not safe: One can't even call singular_ring_delete on it. That's why I removed it, unless currRing==NULL
.
EDIT:
kernel/Number.h
contains uncommented definitions:using namespace boost; inline void intrusive_ptr_add_ref(ring r){ r->ref++; //Print("ref count after add: %d", r->ref); } inline void intrusive_ptr_release(ring r){ if (r->ref<=0) rDelete(r); else { r->ref--; } //Print("ref count after release: %d", r->ref); }
I find the
ptr_release
a little worrisome: Do the singular people believe thatr->ref ==0
means that there is still a reference?
According to the code snipped, the ring is deleted if r->ref==0
. So, apparently r->ref==0
means that there is no reference.
this code all indicates that special action is required only if
ref<=0
'before' decreasing.
If ref<=0
then no reference is left, hence, the ring is deleted. Otherwise, the reference counter is decremented. I really don't understand the problem.
In that case we should probably ensure that
singular_ring_ref
andsingular_delete_ring
are somehow aliased toboost::intrusive_ptr_add_ref
andintrusive_ptr_release
fromkernel/Number.h
.
Singular tests whether ref<=0
; if it is, then the ring is deleted, but if it isn't then the counter is decremented.
We first decrement the counter, and then test whether ref<0
; if it is, then the ring is deleted, but if it isn't then no further action is taken.
Isn't that logically the same?
Replying to @simon-king-jena:
According to the code snipped, the ring is deleted if
r->ref==0
. So, apparentlyr->ref==0
means that there is no reference.
Indeed, but this is tested before decrement. So for this to work, you'd need
R=create_ring_with_appropriate_refcount
// here R.ref should be 0
//for the following to cause deletion.
intrusive_ptr_release(R)
We initialize R.ref=1
in singular_ring_new
.
If
ref<=0
then no reference is left, hence, the ring is deleted. Otherwise, the reference counter is decremented. I really don't understand the problem.
If ref<=0
BEFORE the reference count is decreased. The difference is that intrusive_ptr_release
happily leaves a ring in memory that AFTER the call has R.ref==0. Our singular_ring_delete
will only abstain from calling rDelete(doomed)
if after the call we have doomed.ref > 0
.
401 doomed.ref = doomed.ref-1
402 if doomed.ref > 0:
403 return
The two behaviours are off-by-one.
Singular tests whether
ref<=0
; if it is, then the ring is deleted, but if it isn't then the counter is decremented.We first decrement the counter, and then test whether
ref<0
; if it is, then the ring is deleted, but if it isn't then no further action is taken.
We don't. We test whether ref<=0
(or rather whether ref>0
to see if we should do an early exit).
As our doctests show, apparently we're never calling singular_ring_delete
on rings we haven't initialized the ref
field on ourselves and that singular expects to to survive after we do, since that would probably lead to an error (Singular would try to access a deallocated ring).
A mismatch in the other direction would be milder: Singular takes a reference on a ring we initialized the ref field on, we lose our last reference via a singular_ring_delete
(no rDelete gets called because Singular correctly increased the refcount). However, Singular will never delete the ring because we originally initialized the ref field to 1 whereas Singular apparently expects it to be initialized to 0.
Replying to @simon-king-jena:
I am sure that it is not safe: One can't even call singular_ring_delete on it. That's why I removed it, unless
currRing==NULL
.
I think that's because singular_ring_delete
does a changeCurrentRing
on it. Calling rDelete
seems to work just fine. I think that thanks to the init=1
the memory for the ring is allocated, but initialized to 0 thanks to the omAlloc0
. My failure to locate a ref = 1
anywhere in Singular's code also makes me believe that they're happy giving back a ring where ref = 0
.
I'm not so sure the rChangeCurrRing
dance is necessary in singular_ring_delete
. The code in intrusive_ptr_release
doesn't need it. I suspect Volker put it in as an extra precaution while he was trying to debug other issues. A cursory reading of the code of rDelete
doesn't seem to indicate its operation is affected in any way by whether the current ring is equal to the one being deleted.
OK, I asked Hans Schönemann. I hope he'll answer soon.
If r->ref==0
really means in Singular that there is exactly one reference left, then we should act accordingly.
Changed work issues from Input from a libsingular expert to none
Hans Schönemann gave me a couple of answers - many thanks! I hope I am translating and summarising correctly:
r->ref
counts the number of interpreter variables referencing a ring minus one. Hence:
ring r=.....; // ref is 0
def rr=r; // ref is 1
kill r; // ref is 0
kill rr; // ref is 0 when calling rKill -> delete it.
There are locally generated rings (in std, for example), unbeknownst to the interpreter - they have ref zero.
Singular uses r->ref
only in rKill
- it makes me wonder why we use rDelete
and not rKill
, by the way. In fact, rKill
deletes local data, but rDelete
doesn't.
Generally, one should have currRingHdl.ring == currRing
, or currRing==NULL
and currRingHdl==NULL
. I just notice that he wrote currRingHdl.ring
, not currRingHdl.data.uring
- is that a difference?
Concerning debugging: If one builds Singular so that OM_NDEBUG
is not defined, then a debug version of omalloc is used. In that way, we would have more easily detected the original problem with omStrDup
.
Conclusions
I am not sure if you agree with my conclusions, but here we go:
Since r->ref
does not play a rôle in libsingular, we are free to use r->ref
to count the number of pointers to r (not "number of pointers minus one"). However, when calling singular interpreter functions, we must make sure that r->ref>0
. With our patch, we already do so - hence, that's fine.
Since we apply singular_ring_delete
to non-commutative (quotient) rings, and we do not call rKill
but only rDelete
, we currently have a memory leak for non-commutative rings. rKill
will first test that r->ref==0
, then kills local data, then kills the ring, and sets currRing
and friends to NULL if the to-be-deleted ring is currRing
. Hence, we should use rKill
in singular_ring_delete
, but probably without the rChangeCurrRing
dance.
If currRing==NULL
, we should not create currRingHdl
. If currRing!=NULL
then we should let uring point to it. the latter is already done in my patch, the former should be done.
So far I forgot to ask Hans about enterid("my_awesome_sage_ring", 0, RING_CMD, &IDROOT, 1)
.
Work Issues: Replace rDelete by rKill. Only create currRingHdl if currRing is not null
Replying to @simon-king-jena:
Since we apply
singular_ring_delete
to non-commutative (quotient) rings, and we do not callrKill
but onlyrDelete
, we currently have a memory leak for non-commutative rings.
Or perhaps we do not have a leak. Namely, the internal data are also referenced via python, and thus will be deleted when the ring is deleted.
Well, let's see if rKill
results in segfaults (due to the attempt to deallocate the internal data twice)...
I have updated attachment: trac_13447-sanitise_ring_refcount.patch.
Changes:
currRingHdl
is only created if currRing!=NULL
, and currRingHdl.data.uring
is subsequently set to currRing
.singular_ring_delete
is now using rKill
in lieu of rDelete
, so that local data are guaranteed to be deallocated.In my previous comment, I was a bit sceptical about the second point. However, all doctests pass on bsd.math, hence, using rKill
does not result in a segfault. In other words, the local data are not double-freed, und thus rKill
is better than rDelete
.
Apply trac_13447-sanitise_ring_refcount.patch
Changed work issues from Replace rDelete by rKill. Only create currRingHdl if currRing is not null to none
Here is one comment of Hans on the following lines of the patch:
global currRingHdl
if currRingHdl == NULL and currRing!=NULL:
# Create an invalid mock ring - it would not be possible
# to make that ring currRing!
# The only aim is to create currRingHdl
currRingHdl = enterid("my_awesome_sage_ring", 0, RING_CMD, &IDROOT, 1)
# Now we assign proper data to currRingHdl
rDelete(currRingHdl.data.uring)
currRingHdl.data.uring = singular_ring_reference(currRing)
He says that it is ok like that. However, it would be better to do
global currRingHdl
if currRingHdl == NULL and currRing!=NULL:
# Create an invalid mock ring - it would not be possible
# to make that ring currRing!
# The only aim is to create currRingHdl
currRingHdl = enterid("my_awesome_sage_ring", 0, RING_CMD, &IDROOT, 0)
# Now we assign proper data to currRingHdl
currRingHdl.data.uring = singular_ring_reference(currRing)
Note the difference in the last argument of enterid
.
I'll try it...
I have updated the patch again. Tests pass on bsd.math.
The difference between enterid(...,0) and enterid(...,1) is that the former does not allocate memory for currRingHdl.data.uring
, while the latter does. Hence, using enterid(..., 0)
, one avoids the useless call to rDelete
.
Apply trac_13447-sanitise_ring_refcount.patch
I'm afraid you (at least theoretically) have recreated the possibility for our original error to occur. We essentially have:
SingularFunction.__init__:
if currRingHdl == NULL and currRing!=NULL:
currRingHdl = <create ring handle>
currRingHdl.data.uring = currRing
SingularFunction.__call__:
if currRingHdl.data.uring!= currRing:
singular_ring_delete(currRingHdl.data.uring)
currRingHdl.data.uring = singular_ring_reference(currRing)
If we init with currRing == NULL
we may be leaving currRingHdl == NULL
. If a subsequent invocation of call happens after currRing
has a new value and currRingHdl
is not updated (if that would never happen we wouldn't need this code) we'd be calling
if NULL.data.uring!= currRing:
singular_ring_delete(NULL.data.uring)
NULL.data.uring = singular_ring_reference(currRing)
which will segfault (OK, this error is much better than the original).
From what you describe, rKill
can set currRingHdl=NULL
as well, in which case we definitely get an error in the above code. Thus it seems to me it's theoretically possible that __call__
happens when currRingHdl=NULL
and currRing!=NULL
. In that case you should recreate a handle:
if currRingHdl != NULL:
if currRing != NULL:
if currRingHdl.data.uring != currRing:
if currRingHdl.data.uring != NULL:
singular_ring_delete(currRingHdl.data.uring)
currRingHdl.data.uring = currRing #I don't think this should up refcounts
else: #currRing == NULL and currRingHdl != NULL
rKill(currRingHdl) #delete the handle as well. You could also save it for reuse
#I think this looks at the refcount
currRingHdl = NULL
else:
if currRing != NULL:
currRingHdl = enterid("my_awesome_sage_ring", 0, RING_CMD, &IDROOT, 0) #note the 0
currRingHdl.data.uring = currRing
# NOW the following invariant holds:
assert (currRingHdl == NULL and currRing == NUL) or currRingHdl.data.uring == currRing
I think that without extra assumptions you must be prepared to deal with all those conditions. Also note that you might be creating a new ringhandle an arbitrary number of times, so knowing whether you leak becomes important. The sequence I use above is what i think the libsingular examples that are shipped with singular do.
You can already see here that the use of rKill (which does look at reference counts) forces us to follow Singular's conventions of what the ref
field. At this point we're assuming that if currRingHdl holds the last reference to a ring, we should probably delete it.
I think the convention is that currRingHdl (if it's relevant) is pointing to the same ring as currRing, so they are treated as one reference: currRingHdl.data.uring = currRing
does not increase a refcount. That obviously leads to problems when currRing
diverges from currRingHdl
.
Either you attach memory management ONLY to currRingHdl
and trust that when currRing
is changed, someone else will still have a reference to the original value, who can throw away the ring (so, currRing
is only changed by straight assignments -- no increfs or decrefs. It's always a borrowed reference) or you always ensure that currRingHdl
gets updated (or at least cleared) when currRing
changes. My impression is that refcounting in singular is only an interpreter thing, so that the former is the model used by Singular itself.
Double free implies segfault: Have you verified this is true with omAlloc?
Refcounting base: I agree that as long as our created rings and rings created internally live separate lives, it doesn't matter what semantics we attach to the ref
field. I just think it's dangerous to implicitly assume these rings do live separate lives. They do now, but who's to guarantee that they do in the future? And even then it may take a long time for a visible error to occur, which by then might be difficult to track down. I think we should adjust to the semantics Singular uses in other places (and document the slightly unusual refcounting semantics of Singular in our code!) to make the libsingular interface as robust as possible. You know how extensions/modifications are going to happen: people will just copy/paste what's already there.
Replying to @simon-king-jena:
There are locally generated rings (in std, for example), unbeknownst to the interpreter - they have ref zero.
Singular uses
r->ref
only inrKill
- it makes me wonder why we userDelete
and notrKill
, by the way. In fact,rKill
deletes local data, butrDelete
doesn't.
If we do that, we should ensure that rKill
has an accurate view of the number of references. That means adjusting to Singular's conventions. If we rKill with our current code, we'd leak because we'd call with a positive refcount. I guess you could manually decref before calling rKill but that's silly and misleading to future maintainers of the code.
Generally, one should have
currRingHdl.ring == currRing
, orcurrRing==NULL
andcurrRingHdl==NULL
. I just notice that he wrotecurrRingHdl.ring
, notcurrRingHdl.data.uring
- is that a difference?
I don't think the former actually exists. A macro is used in the singular code base, so Hans would not usually spell it out in full.
Concerning debugging: If one builds Singular so that
OM_NDEBUG
is not defined, then a debug version of omalloc is used. In that way, we would have more easily detected the original problem withomStrDup
.
Didn't work for me but I might have just failed to properly (de)activate those flags. Anyway, I doubt an access-after-free would be immediately detected. That really involves unmapping or protecting memory in order to force a segfault. I didn't see evidence that omAlloc reaches that deep into the system. It may be that omAlloc in debugging mode does consistency checks on every use. In that case we might have found the corruption a little earlier. I doubt we would have had gdb point exactly at the offending instruction. I think it would be nice for singular to update the omAlloc headers with my changes so that (at least on linux and OSX) they can build it with the malloc underneath. I found it very useful for this particular issue.
Conclusions Since
r->ref
does not play a rôle in libsingular, we are free to user->ref
to count the number of pointers to r (not "number of pointers minus one"). However, when calling singular interpreter functions, we must make sure thatr->ref>0
. With our patch, we already do so - hence, that's fine.
In principle yes, but not if we use rKill and I think it's a bad idea in general because it might cause unforeseen problems in the future. Better to stick with one interpretation of what the ref
field means.
If
currRing==NULL
, we should not createcurrRingHdl
. IfcurrRing!=NULL
then we should let uring point to it. the latter is already done in my patch, the former should be done.
As you indicate, currRingHdl can be NULLed again by rKill. We need to correct that if we need currRingHdl later.
In fact, without extra infrastructure to clean up after it, it seems that rKill
leaks ringhandles. Is it the case that ring handles are registered in some global data structure that allows them to be deleted? This loop might be the candidate:
while (r->idroot!=NULL)
{
killhdl2(r->idroot,&(r->idroot),r);
}
that might do this. In any case, rKill just does a straight currRingHdl=NULL
if (r==currRing)
, so currRingHdl
is definitely gone.
Is it really necessary to use rKill
? It seems it is more part of the interpreter layer and that the libsingular interface itself is largely one level lower. Have we initialized enough of the interpreter to let rKill
function as advertised?
It is by now clear that up to now, the Sage-libsingular interface was largely based on empirically verified guesses. Now that Simon has good contact with Hans, perhaps we can ensure the interface follows official protocol?
Replying to @nbruin:
Replying to @simon-king-jena:
Singular uses
r->ref
only inrKill
- it makes me wonder why we userDelete
and notrKill
, by the way. In fact,rKill
deletes local data, butrDelete
doesn't.If we do that, we should ensure that
rKill
has an accurate view of the number of references. That means adjusting to Singular's conventions. If we rKill with our current code, we'd leak because we'd call with a positive refcount.
No, we don't. With the current patch, rKill
is only called if ref==0.
Conclusions Since
r->ref
does not play a rôle in libsingular, we are free to user->ref
to count the number of pointers to r (not "number of pointers minus one"). However, when calling singular interpreter functions, we must make sure thatr->ref>0
. With our patch, we already do so - hence, that's fine.In principle yes, but not if we use rKill
I disagree, since we only call rKill with r->ref==0
.
and I think it's a bad idea in general because it might cause unforeseen problems in the future. Better to stick with one interpretation of what the
ref
field means.
In a way, we do. Namely, one could argue that a Sage polynomial ring MPolynomialRing_libsingular
is an interpreter variable (roughly, it is an object that does not live in the Singular kernel but points to a ring in the Singular kernel). Hence, when creating a polynomial ring, it agrees with Singular's conventions to increase the refcount.
As you indicate, currRingHdl can be NULLed again by rKill. We need to correct that if we need currRingHdl later.
currRingHdl
is only NULLed if currRing
is deleted. So, that should be fine.
In any case, rKill just does a straight
currRingHdl=NULL
if(r==currRing)
, socurrRingHdl
is definitely gone.
You mean, it doesn't free it before assigning NULL
? Yes, that would seem to be a leak.
Is it really necessary to use
rKill
? It seems it is more part of the interpreter layer and that the libsingular interface itself is largely one level lower.
I'll try rKill
versus rDelete
in singular_ring_delete
and will try to see if there is a leak when creating and deleting super commutative rings (SCA).
Too bad. Apperently an SCA can not be garbage collected, in spite of all the weak caches from #715 and #11521. So, singular_ring_delete will never be called on an SCA.
With #12313 and its dependencies, SCA can be collected, but its underlying ring is not deleted. Wrong refcount, I guess.
The wrong refcount appears to be in sage/libs/singular/function.pyx, where in the current patch we have
# We need to incref the to-be-converted data,
# since apparently removing to_convert.data would
# decref it.
return new_RingWrap( singular_ring_reference(<ring*> to_convert.data) )
There was some example where the refcount was wrong without calling singular_ring_reference
, but for SCA it is wrong with that call.
So, that is something that we need to understand, and needs work.
Work Issues: Understand why sometimes new_RingWrap
needs an incref and sometimes not
Some suggestions:
The singular function rKill
seems to do what we want in singular_ring_delete
. Hence, to keep it simple, singular_ring_delete
should call rKill
, and nothing more (unless we want a sanity test, such as printing a warning if singular_ring_delete
is called on NULL).
rKill
only takes action if r->ref
is zero when it is called. Indeed, according to Hans, r->ref
is the number of interpreter variables pointing to a ring, minus one. In order to avoid confusion, we should follow the same idea.
So, what means "interpreter variables" for us? I think this should be "any Sage object whose deallocation needs the libsingular ring being alive". This is every polynomial ring, every Gröbner strategy, and any element of a polynomial ring.
Since ref is -1 based, when a Sage ring R is created then ref shall be zero, so that the underlying libsingular ring can be properly deleted if R is deallocated. And if any element or Gröbner strategy on that ring is created, ref needs to be incremented. The deallocation of an element or Gröbner strategy must involve a call to singular_ring_delete
.
A complication arises when calling a singular_function. When we call a Singular interpreter function, then we should have ref>0, again according to Hans. Hence, at the begin of a call, ref must be incremented, and decremented when leaving it.
Special care needs to be taken on currRingHdl
. According to Hans, rKill
can make it NULL, and it should be NULL if and only if currRing
is NULL. Currently, we create currRingHdl
when we create a Singular function, but we only use it when we call a Singular function. That's asking for trouble. Hence, we should move the creation of currRingHdl to SingularFunction.__call__
, where we also set the correct currRing
. SingularFunction.__init__
is not the right place.
I think one potential problem with our current use of currRingHdl
in combination with rKill
is that we would redefine my_awesome_sage_ring
, hence, we would see the following message on stderr:
// ** redefining my_awesome_sage_ring **
I'll ask Hans how one can create currRingHdl
without creating a mock ring.
Here is one complication for singular_function: If you do singular_function("std")
, then a SingularKernelFunction
is created. In its __init__
, it creates a call handler, and for that purpose it needs to find out the arity of the Singular kernel function.
But apparently the arity of std can not be correctly determined, if currRing==0
: One obtains arity 0, not 1, for std.
Therefore, when I move the creation of currRingHdl
from SingularFunction.__init__
to where the function is called, an error occurs.
A question arises: Is there a singular function that changes its arity depending on currRing
? If it is the case, then we must determine SingularKernelFunction.call_handler
each time the function is called, rather than during initialisation.
Replying to @simon-king-jena:
I think one potential problem with our current use of
currRingHdl
in combination withrKill
is that we would redefinemy_awesome_sage_ring
, hence, we would see the following message on stderr:// ** redefining my_awesome_sage_ring **
I suspect the records do get cleared, because otherwise Singular would leave a dangling pointer around and one would have what is essentially a singular ring variable with NULL pointer to a ring or a pointer to unallocated memory .
If those variables do get properly removed, by the time we have to redefine the thing there should be no trace of the original and hence it would not be a redefinition. If the skeleton somehow does remain (with a nulled out data.uring
field then, I suppose?), we could keep it around:
saved_skeleton=enterid("my_awesome_sage_ring", 0, RING_CMD, &IDROOT, 0)
....
if currRingHdl == NULL:
currRingHdl = saved_skeleton
currRingHdl.data.uring = singular_ring_reference(currRing)
Incidentally, doesn't currRingHdl
count as a ring reference? The code in SingularFunction
that started this whole thing seems to indicate this. In that case the only singular_ring_delete(R)
that can trigger nulling of currRingHdl
and currRing
is when the explicit intention is
singular_ring_delete(currRingHdl.data.uring)
. When decreffing any other pointer that just happens to be pointing to the same ring structure, there would still be a refcount left, so no actual deletion would take place.
Note that after singular_ring_delete(currRingHdl.data.uring)
we should absolutely NULL currRingHdl.data.uring
(probably safer/better to do currRingHdl=NULL
) even if deallocation didn't occur, because we should remove the reference. Therefore, I find it surprising that rKill
does do a currRingHdl=NULL
conditionally internally. With correct usage, rKill
should never execute that code usefully, because it needs to happen unconditionally outside of rKill
if it ever happens. Is it there as a safety net against sloppy programming?
Note that doing singular_ring_delete(currRing)
is nearly always wrong, because currRing
is a borrowed reference as far as I understand.
Are we the only ones ever setting currRingHdl
or can one write/call singular functions that internally will change currRingHdl
?
If the former holds, the code becomes a little easier to reason about and then we can make our own provisions and ensure they are followed properly (such as saving a skeleton).
In the latter case we really have to play by Singular's rules, because Singular might be doing the deallocating outside of our control.
Replying to @nbruin:
Replying to @simon-king-jena:
I think one potential problem with our current use of
currRingHdl
in combination withrKill
is that we would redefinemy_awesome_sage_ring
, hence, we would see the following message on stderr:// ** redefining my_awesome_sage_ring **
I suspect the records do get cleared, because otherwise Singular would leave a dangling pointer around and one would have what is essentially a singular ring variable with NULL pointer to a ring or a pointer to unallocated memory . If those variables do get properly removed, by the time we have to redefine the thing there should be no trace of the original and hence it would not be a redefinition.
I am afraid Singular knows that the mock ring has been defined. OK, that's relative to #12313. But it happened when I did
sage: from sage.rings.polynomial.plural import SCA
sage: E = SCA(QQ, ['x', 'y', 'z'], [0, 1], order = 'degrevlex')
sage: import gc
sage: del E
sage: _ = gc.collect()
sage: E = SCA(QQ, ['x', 'y', 'z'], [0, 1], order = 'degrevlex')
Hans didn't answer, but I found out myself: currRingHdl=rFindHdl(currRing,NULL,NULL)
, which could still be NULL. If it is NULL, then we may create the mock ring.
Are we the only ones ever setting
currRingHdl
or can one write/call singular functions that internally will changecurrRingHdl
?
No idea.
Combined patch
Attachment: trac_13447-sanitise_ring_refcount.patch.gz
I have attached a new version of the patch. Here are a few comments:
For easier debugging, I created methods returning the refcount for RingWrap
and for polynomial rings, and for g-algebras.
__Concerning singular_function__
currRingHdl
is now taken care of when the function is called, and not when it is created. In that way, one prevents nastinesses like currRingHdl
being NULL because of a previous deallocation. When creating it currRing
is guaranteed to exist. I call rFindHdl(currRing,NULL,NULL)
, which may find a previously defined thingy. If it can not be found, a mock ring is created by
currRingHdl = enterid("my_awesome_sage_ring", 0, RING_CMD, &IDROOT, 0)
Since the last argument is 0, we can then do
currRingHdl.data.uring = currRing
without the need to call rDelete(currRingHdl.data.uring)
. Problem: See below.
Since currRing
is now not necessarily available when initialising SingularKernelFunction
, it is impossible to determine its arity during initialisation, but fortunately it is still possible to determine whether the function exists in the Singular kernel.
Therefore, during __init__
, I merely test for existence, while the creation of self.call_handler
is postponed to a point were currRing
is guaranteed to be around.
Since Hans said that we need to make sure that currRing->ref
is positive when calling an interpreter function, I temporarily increment the counter during the call.
Concerning ref counting
I think we should simply rely on Singular's rKill
for deletion of rings, so that internal data are taken care of.
currRingHdl.data.uring
and currRing
do not constitute a reference in the sense of "interpreter variable". Therefor I am not incrementing or decrementing the refcount on them (unless I missed something).
Because of ref counting based at -1, a naked MPolynomialRing_libsingular
and NCPolynomialRing_plural
should have self._ring.ref == 0
, so that self._ring
can be properly deleted. All other ring-related data, such as a Gröbner strategy or elements of the ring, constitute a new reference.
The same should hold for a RingWrap
: If one has a naked RingWrap
then the underlying ring should be deletable when deleting RingWrap
. A complication arises if both a RingWrap
and a polynomial ring refer to the same underlying libsingular ring.
This occurs in NCPolynomialRing_plural
, which is created from a ring wrap. I do
cdef RingWrap rw = ncalgebra(self._c, self._d, ring = P)
# rw._output()
self._ring = singular_ring_reference(rw._ring)
hence incref, because at the end of the initialisation of self, rw
will be deleted, so that the refcount for self._ring
will then be fine. The same holds for new_CRing
and new_NRing
in sage.rings.polynomial.plural.
Some experiments with the new patch
Worst of all: I did not run the tests yet.
Here is a test whether there is a memleak when creating a super-commutative algebra repeatedly. Note that one needs
$ hg qa
trac_715_combined.patch
trac_715_local_refcache.patch
trac_715_safer.patch
trac_715_specification.patch
trac_11521_homset_weakcache_combined.patch
trac_11521_callback.patch
13145.patch
trac12215_weak_cached_function-sk.patch
trac12215_segfault_fixes.patch
trac_12313-mono_dict-combined-random-sk.patch
trac_12313_quit_sage.patch
trac13370_deprecate_is_field.patch
trac_13447-sanitise_ring_refcount.patch
because otherwise super commutative algebras could not be garbage collected.
sage: from sage.rings.polynomial.plural import SCA
sage: import gc
sage: while 1:
....: E = SCA(QQ, ['x', 'y', 'z'], [0, 1], order = 'degrevlex')
....: print get_memory_usage(),id(E)
....: del E
....: _ = gc.collect()
....:
223.98828125 4520110400
223.98828125 4520102992
223.98828125 4520145584
223.98828125 4520100800
223.98828125 4520190656
223.98828125 4520116128
223.98828125 4516130448
223.98828125 4520183424
223.98828125 4338291168
223.98828125 4520148816
223.98828125 4520173408
223.98828125 4510376384
223.98828125 4520106672
...
Conclusion: There is no leak, even though it really is a new ring each time.
However, not all is good. Namely, we interrupt the computation above, and continue as follows:
sage: from sage.libs.singular.function import singular_function
sage: P.<x,y,z> = PolynomialRing(QQ)
sage: ringlist = singular_function("ringlist")
sage: l = ringlist(P)
// ** redefining my_awesome_sage_ring **
So, here you see the problem with creating the mock ring. I can only hope that Hans will tell us how one can create a currRingHdl
for a given currRing
, rather than only finding a previously created currRingHdl
.
Because of that last problem, I think it doesn't make much sense to test. However, comments are welcome.
Unfortunately one gets a segfault in the tests of sage/rings/polynomial/multi_polynomial_ideal.py, apparently related with a singular_function call (namely in the method variety
).
Too bad. I really thought that the new approach towards refcounting is safe.
However, not all is good. Namely, we interrupt the computation above, and continue as follows:
sage: from sage.libs.singular.function import singular_function sage: P.<x,y,z> = PolynomialRing(QQ) sage: ringlist = singular_function("ringlist") sage: l = ringlist(P) // ** redefining my_awesome_sage_ring **
I think what your code does, is look up if there is a Singular variable that points to currRing
. If so, you use that one and you let the Singular variable my_awesome_sage_ring
dangling in the system. If you don't find a Singular variable pointing to currRing
then you (re)create the variable my_awesome_ring
to refer to currRing
. But it could be that that variable is alive and well, happily pointing at some other ring (that once was currRing
). I think the system probably has good reason to complain.
What you should probably do is create my_awesome_sage_ring
, keep a pointer to it, and refcount rings that you let it point to. Now you're simply holding a singular variable reference to a ring. When you need currRingHdl
to point somewhere, you can do that via my_awesome_sage_ring
. If you need to NULL currRingHdl
, you can do so without consequence. You still keep your own pointer to my_awesome_sage_ring
.
It means that the ring pointed to by my_awesome_sage_ring
is protected against garbage collection, but that's correct: We're holding a reference to it! Such rings will become eligible for deletion once my_awesome_sage_ring
is made to point elsewhere. If you don't like rings that were currRing
to survive until currRingHdl
is actually needed to point elsewhere, you could NULL the data.uring
field of my_awesome_sage_ring
as soon as you get a chance, but I wouldn't bother.
I haven't traced through your precise usage, but grabbing and dropping Singular variables without adapting reference counts sounds fishy to me and liable to lead to segfault.
I'm pretty sure currRingHdl
is just a Singular Interpreter level entry to currRing
and probably in the singular interpreter, the current ring is always only a ring that actually has a name in the interpreter, so currRingHdl
is simply a borrowed reference to that variable. That's why currRingHdl
manipulations in singular do not involve refcounting (I haven't checked, but it seems to me you claim they don't): The rings themselves are refcounted, but that has already happened when the handle struct that currRingHdl
is pointing at was instantiated. That struct is surving independent of currRingHdl
because it represents a variable binding in the singular interpreter and therefore is probably indexed in some tree, linked list or hash table (i.e., the index that rFindHdl
goes digging in).
I guess the recreation of my_awesome_sage_ring can be avoided, because Singular certainly has methods to return a previously defined ring when you know the name of that ring. So, no need that Sage keeps a pointer to it, because Singular already does.
The segfault in sage/rings/polynomial/multi_polynomial_ideal.pyx can be avoided by setting ref=1
rather than ref=0
in the function singular_ring_new
. However, I wouldn't call it a "fix", because I think this would mean the ring could never be rKill
ed.
So, I suspect that the reference count goes wrong in a different location.
Replying to @simon-king-jena:
I guess the recreation of my_awesome_sage_ring can be avoided, because Singular certainly has methods to return a previously defined ring when you know the name of that ring.
It is ggetid
.
Replying to @simon-king-jena:
It is
ggetid
.
Using it does solve the problem with the warning message in comment:74.
In addition, it makes the segfault in sage -t devel/sage/sage/rings/polynomial/multi_polynomial_ideal.pyx
disappear - which is a bad news, because instead the test hangs during some call to a singular_function...
Attachment: trac_13447-attempted_improvement.patch.gz
Experiments towards fixing some problems
I have posted a new patch attachment: trac_13447-attempted_improvement.patch. According to Hans, it may happen that Singular functions return NULL, which means that there has been an error. Therefore, I suggest to actually raise a RuntimeError
if the return value is NULL. Apart from that, I postpone another ggetid
call to a location where currRing
is guaranteed to exist; however, according to Hans, that call to ggetid
(namely in the case of a library functions) should be fine also without currRing
.
Here is the problem:
sage -t --verbose devel/sage/sage/rings/polynomial/multi_polynomial_ideal.py
...
Trying:
R = QQ['a, b']; (a, b,) = R._first_ngens(2); I = R.ideal(a**Integer(2)+b**Integer(2)-Integer(1))###line 3581:_sage_ >>> R.<a,b> = QQ[]; I = R.ideal(a^2+b^2-1)
Expecting nothing
ok
Trying:
Q = QuotientRing(R,I); K = Frac(Q)###line 3582:_sage_ >>> Q = QuotientRing(R,I); K = Frac(Q)
Expecting nothing
// ** char_series returns 0 x 0 matrix from 2 input polys (0)
I[1,1]=b2+a2-1*** *** Error: TIMED OUT! PROCESS KILLED! *** ***
[360.3 s]
I have absolutely no idea why that happens.
The error occurs in the Singular library function primdecSY
. I did check that currRing
is definitely not NULL when calling the function, currRingHdl.data.uring==currRing
, and there are plenty of references to currRing
. Moreover, the error does not occur if one copies the whole example (which is from the groebner_basis method) into an interactive session.
primdecSY
hangs, there is no return value (not even NULL).
Since there is no segfault, I guess gdb would not help here. Since currRing is referenced several times, I guess that it has not been double-freed - but who knows? Can it be that a ring was double-freed, then a new ring was created in the same location, and then an error occurs when working with that new ring?
For now:
Apply trac_13447-sanitise_ring_refcount.patch trac_13447-attempted_improvement.patch
By the way, has anyone preserved an old version of attachment: trac_13447-sanitise_ring_refcount.patch? It used to work well, although Nils had some objections.
Here is the result of the test suite:
sage -t -force_lib devel/sage/sage/rings/polynomial/multi_polynomial_ideal.py # Time out
sage -t -force_lib devel/sage/doc/de/tutorial/tour_advanced.rst # Time out
sage -t -force_lib devel/sage/doc/en/tutorial/tour_advanced.rst # Time out
sage -t -force_lib devel/sage/doc/fr/tutorial/tour_advanced.rst # Time out
sage -t -force_lib devel/sage/doc/ru/tutorial/tour_advanced.rst # Time out
sage -t -force_lib devel/sage/sage/libs/singular/function.pyx # 5 doctests failed
sage -t -force_lib devel/sage/sage/schemes/generic/algebraic_scheme.py # Time out
Looks like a lot of work. Fortunately, the errors in function.pyx seem to be a bit more "expressive" than the timeouts.
I found something fishy with the refcount during sage.libs.singular.function.call_function
.
I changed the code as follows:
# In the Singular interpreter, we must ensure that currRing->ref > 0.
si_ring.ref += 1
cdef int orig_ref = si_ring.ref # CHANGE: store the reference count
try:
with opt_ctx: # we are preserving the global options state here
if signal_handler:
sig_on()
_res = self.call_handler.handle_call(argument_list, si_ring)
sig_off()
else:
_res = self.call_handler.handle_call(argument_list, si_ring)
if myynest:
myynest = 0
if currentVoice:
currentVoice = NULL
if errorreported:
errorreported = 0
raise RuntimeError("Error in Singular function call '%s':\n %s"%
(self._name, "\n ".join(error_messages)))
if si_ring.ref!=orig_ref: # CHANGE: Test if Singular has changed the refcount
ff = file("/Users/SimonKing/blubber","a")
ff.write( "Here is something fishy!\n")
ff.close()
res = argument_list.to_python(_res)
...
When running the failing sage -t devel/sage/sage/rings/polynomial/multi_polynomial_ideal.py
, the refcount changes exactly once before the test hangs.
It is definitely not supposed to happen that Singular changes the refcount of currRing
. Changes in the ref counter should only happen in the next line, when the results are converted to python (any polynomial will increase the refcount by one).
I don't know yet what function is causing the trouble - but I hope I'll find out soon...
Presently, #715 + #11521 help not permanently keeping parent in memory. In the process we uncovered a hard-but-consistently triggerable problem with the collection of
MPolynomialRing_libsingular
. We have only observed the problem onbsd.math.washington.edu
, MacOSX 10.6 on x86_64.The present work-around is to permanently store references to these upon creation, thus preventing collection. It would be nice if we could properly solve the problem (or at least establish that the problem is specific to
bsd.math
)Depends on #11521
CC: @simon-king-jena @malb @vbraun @gagern @robertwb @sagetrac-ylchapuy @jpflori @burcin
Component: memleak
Author: Simon King
Branch/Commit: public/make_libsingular_multivariate_polynomial_rings_collectable @
b4df239
Reviewer: Travis Scrimshaw
Issue created by migration from https://trac.sagemath.org/ticket/13447