polarbearzoo / maidsafe-dht

Automatically exported from code.google.com/p/maidsafe-dht
0 stars 0 forks source link

Invalid CUDT instance used (crash) #10

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Set a breakpoint (to force the problem) in the KnodeImpl::node_id()
function (knodeimpl.h line 283, SVN revision 389)
2. Start Kaddemo as a client to connect to a vault (as specified in the Wiki)
3. Wait few seconds (around 10)
4. Continue the application execution
5. Sometimes (around a 10% of the times here) will crash

What is the expected output? What do you see instead?

Should be stable, I see that the app crashes

What version of the product are you using? On what operating system?

OS: Windows XP
Version: SVN revision 389

Please provide any additional information below.

Seems that into the CRcvQueue::init() function, a CUDT instance is invalid
(I see and Access violation reading location 0x8913005f and the CUDT*
pointer is 0x89130002). 

Some screenshots are attached. You can see the active thread/thread list,
callstack, the CUDT object contents and the line on the sourcecode where it
crashes.

Original issue reported on code.google.com by bakt...@gmail.com on 26 Apr 2010 at 7:00

Attachments:

GoogleCodeExporter commented 9 years ago
We should try and get this problem into the UDT forum. Gu's bound to have a much
better idea of what is going on with this.

Original comment by dan.schm...@gmail.com on 27 Apr 2010 at 3:48

GoogleCodeExporter commented 9 years ago
I agree absolutely. Is there any way we can make this appear only using UDT 
code in a 
test somehow. Gu would respond quickly to that. 

Brilliant man I was worried we may not get to this so quickly. 

Any takers for a test piece of code to send GU (UDT maintainer)

Original comment by irvine.d...@gmail.com on 27 Apr 2010 at 4:25

GoogleCodeExporter commented 9 years ago

Original comment by dan.schm...@gmail.com on 28 Apr 2010 at 9:09

GoogleCodeExporter commented 9 years ago
hey,

I have been able to isolate this problem.

The bug is related to the garbage collector and CRcvQueue

UDT::connect calls m_pRcvQueue->setNewEntry(this);

and m_pRcvQueue holds a pointer to the CUDT later this pointer is processed in 
another thread and sometimes this pointer is freed by the garbage collector

Doing a quick hack I just added a new bool to CUDT so when you call setNewEntry 
I 
change the value to true and inside void* CRcvQueue::worker(void* param) after 
self-
>m_pRcvUList->insert(ne); I set it back to false

and in checkBrokenSockets() I check that bool and if is true I don't remove the 
socked

Original comment by vt.o...@gmail.com on 11 May 2010 at 2:50

GoogleCodeExporter commented 9 years ago
Hello,

Thank you very much for the work you have been doing.

We have indeed been having problems for some time now with the garbage 
collection.
There are several instances where these containers of sockets are traversed and 
for
some reason, in some cases, one of the sockets is null. We have reported this 
in the
UDT forum, but from those reports there has been question to whether it is the 
use we
put UDT to that might be exhibiting this unexpected behaviour of the library. 
For
one, we do loads more connections than other projects using it.

We would very much like to try your patch if you'd like to pass it on, and talk 
to
the UDT developer, or back you up, in trying to fix this issue with UDT.

Original comment by dan.schm...@gmail.com on 11 May 2010 at 3:27

GoogleCodeExporter commented 9 years ago
I've just posted your findings and quick fix to
https://sourceforge.net/projects/udt/forums/forum/393036

Original comment by dan.schm...@gmail.com on 11 May 2010 at 3:36

GoogleCodeExporter commented 9 years ago
Hello,

I attach the patch.

let me know if it fixes all those garbage collection related problems

Original comment by vt.o...@gmail.com on 11 May 2010 at 3:44

Attachments:

GoogleCodeExporter commented 9 years ago
Thanks a lot. We'll give it a go tomorrow and let u know about the results

Original comment by dan.schm...@gmail.com on 11 May 2010 at 4:45

GoogleCodeExporter commented 9 years ago
Just to keep you guys updated. I've looked at the problem with this changes. I
reported the problem directly to the UDT forum. The vt.onan's patch is 
something that
had already been noticed by the admins over there. The last code on their CVS
addresses the issue. However, after running our tests with the new code, we came
across a hanging problem with the garbage collector, where there seems always 
to be
some sockets that can't be removed, hence preventing the UDT clean up from
completing. Yunhong, over at UDT, has kindly agreed to look further into the 
problem.

However, if you guys are itching to try something new out, you can just run a 
"cvs
update" on the udt folder on maidsafe_dht and it will update the code, and you 
can
give it a go. As reference, the tests that make the cleanup hang the most are:

TEST_F(TestRefresh, FUNC_KAD_NewNodeinKClosest)
TEST_F(TestRefreshSignedValues, FUNC_KAD_NewRSANodeinKClosest)

in the src/maidsafe/tests/kademlia/testrefresh.cc file and the TESTkademlia 
target

Cheers

Original comment by dan.schm...@gmail.com on 17 May 2010 at 4:29

GoogleCodeExporter commented 9 years ago
Hi there,

Good news. Working together with Yunhong from the UDT project, we have been 
able to
find a potential bug. Our UDT code has now been updated to the latest code in 
the UDT
repository.

Please try again your tests, to see if the issues have been resolved.

Cheers

Original comment by dan.schm...@gmail.com on 2 Jun 2010 at 7:42