ryzom / ryzomcore

Ryzom Core is the open-source project related to the Ryzom game. This community repository is synchronized with the Ryzom Forge repository, based on the Core branch.
https://wiki.ryzom.dev
GNU Affero General Public License v3.0
333 stars 90 forks source link

(linux) crash when using 'return to character selection' from ingame #252

Closed ryzom-pipeline closed 8 years ago

ryzom-pipeline commented 8 years ago

Original report by Meelis Mägi (Bitbucket: [Meelis Mägi](https://bitbucket.org/Meelis Mägi), ).


#0  0x00007fffe7c35d48 in ?? () from /usr/lib/x86_64-linux-gnu/dri/i965_dri.so
#1  0x00007ffff6ecb11e in ?? () from /usr/lib/x86_64-linux-gnu/mesa/libGL.so.1
#2  0x0000000002be6640 in NL3D::NLDRIVERGL::CDriverGL::swapBuffers (this=0x5e05d20)
    at /home/max/src/ryzom.hg/code/nel/src/3d/driver/opengl/driver_opengl.cpp:956
#3  0x00000000020f67e3 in NL3D::CDriverUser::swapBuffers (this=0x5b37250) at /home/max/src/ryzom.hg/code/nel/src/3d/driver_user.cpp:1349
#4  0x00000000017a05a3 in CProgress::internalProgress (this=0x38232e0 <ProgressBar>, value=0.5)
    at /home/max/src/ryzom.hg/code/ryzom/client/src/progress.cpp:454
#5  0x000000000179e2d2 in CProgress::newMessage (this=0x38232e0 <ProgressBar>, message=...)
    at /home/max/src/ryzom.hg/code/ryzom/client/src/progress.cpp:80
#6  0x0000000001816d56 in CFarTP::disconnectFromPreviousShard (this=0x38148c0 <FarTP>) at /home/max/src/ryzom.hg/code/ryzom/client/src/far_tp.cpp:1087
#7  0x0000000001814eb0 in CLoginStateMachine::run (this=0x3826000 <LoginSM>) at /home/max/src/ryzom.hg/code/ryzom/client/src/far_tp.cpp:572
#8  0x0000000001f0b17f in NLMISC::TCoTaskData::run (this=0x395db20) at /home/max/src/ryzom.hg/code/nel/src/misc/co_task.cpp:529
#9  0x0000000001f3ec7b in NLMISC::ProxyFunc (arg=0x6b6f680) at /home/max/src/ryzom.hg/code/nel/src/misc/p_thread.cpp:92
#10 0x00007ffff5000182 in start_thread (arg=0x7fffe4f15700) at pthread_create.c:312
#11 0x00007ffff450a47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111

That crash can be 'fixed' by disabling thread based CCoTask, editing NL_USE_THREAD_COTASK defined in code/nel/src/misc/co_task.cpp.

Still has odd crashes after that tho ;-(

ryzom-pipeline commented 8 years ago

Original comment by Cédric Ochs (Bitbucket: [Cédric OCHS](https://bitbucket.org/Cédric OCHS), ).


That's strange, I had another crash but this time related to CLuaManager :( And under Windows too, it crashes :p

ryzom-pipeline commented 8 years ago

Original comment by Cédric Ochs (Bitbucket: [Cédric OCHS](https://bitbucket.org/Cédric OCHS), ).


Do you think it could be related to CPU affinity being using all cores by default ?

ryzom-pipeline commented 8 years ago

Original comment by Meelis Mägi (Bitbucket: [Meelis Mägi](https://bitbucket.org/Meelis Mägi), ).


Sadly that does not seem to be the case. I forced single cpu and it still crashes ;-(

ryzom-pipeline commented 8 years ago

Original comment by Cédric Ochs (Bitbucket: [Cédric OCHS](https://bitbucket.org/Cédric OCHS), ).


Ah ok, thanks for the confirmation :)

ryzom-pipeline commented 8 years ago

Original comment by Meelis Mägi (Bitbucket: [Meelis Mägi](https://bitbucket.org/Meelis Mägi), ).


Porbably found the reason: stack corruption.

nel/src/misc/co_task.cpp: line 284:

_PImpl->_Stack = new uint8[NL_TASK_STACK_SIZE*1024];

so, 8MB stack works (*2 still crashed, so tried slightly higher :-)

ryzom-pipeline commented 8 years ago

Original comment by Cédric Ochs (Bitbucket: [Cédric OCHS](https://bitbucket.org/Cédric OCHS), ).


Haha great job :) So NL_TASK_STACK_SIZE is too small ? Do you know if we have a way to detect the correct size ?

ryzom-pipeline commented 8 years ago

Original comment by Cédric Ochs (Bitbucket: [Cédric OCHS](https://bitbucket.org/Cédric OCHS), ).


I found this for Windows : "The default stack reservation size used by the linker is 1 MB" but NeL still uses NL_TASK_STACK_SIZE in CreateFiber call :(

ryzom-pipeline commented 8 years ago

Original comment by Meelis Mägi (Bitbucket: [Meelis Mägi](https://bitbucket.org/Meelis Mägi), ).


Infact, just allocating larger buffer should of not worked, because later on it still uses old value

_PImpl->_Ctx.uc_stack.ss_sp = _PImpl->_Stack;
_PImpl->_Ctx.uc_stack.ss_size = NL_TASK_STACK_SIZE;

Which leads to possible another bug as stackSize is supposed to be variable to CCoTask constructor (default=NL_TASK_STACK_SIZE) and I take ss_size should be the size of the _Stack.

ryzom-pipeline commented 8 years ago

Original comment by Meelis Mägi (Bitbucket: [Meelis Mägi](https://bitbucket.org/Meelis Mägi), ).


I filled allocated buffer with 0xAA and counted first non 0xAA starting from index 0 (stack should starts from last index).

With 40960 byte byffer, first non 0xAA was found at 10872. Used size 30088. - worked fine the few time I tried it.

With 32768 byte buffer, first was at 2680 giving same size, but that time it crashed after selecting char.

ryzom-pipeline commented 8 years ago

Original comment by Meelis Mägi (Bitbucket: [Meelis Mägi](https://bitbucket.org/Meelis Mägi), ).


Interesting.... this small change makes threaded CCoTask work on linux. Even the comment there agrees with the change ;-)

--- a/code/ryzom/client/src/far_tp.cpp
+++ b/code/ryzom/client/src/far_tp.cpp
@@ -569,7 +569,7 @@
                        break;
                case st_disconnect:
                        // Far TP part 2: disconnect from the FS and unload shard-specific data (called from farTPmainLoop())
-                       FarTP.disconnectFromPreviousShard();
+                       //FarTP.disconnectFromPreviousShard();

                        SM_BEGIN_EVENT_TABLE
                                SM_EVENT(ev_connect, st_reconnect_fs);
@@ -1406,6 +1406,10 @@
 {
        ConnectionReadySent = false;
        LoginSM.pushEvent(CLoginStateMachine::ev_far_tp_main_loop_entered);
+
+       disconnectFromPreviousShard();
+
        uint nbRecoSelectCharReceived = 0;

        bool welcomeWindow = true;

Non-threaded version stack experiments show 210KB from 1MB "used". Normal usage is around 5KB, but just before char select screen when resources get released and then reloaded, it jumps. Makes no sense.

ryzom-pipeline commented 8 years ago

Original comment by Cédric Ochs (Bitbucket: [Cédric OCHS](https://bitbucket.org/Cédric OCHS), ).


Cool, I'll try this change under Windows too :) Thanks !

ryzom-pipeline commented 8 years ago

Original comment by Cédric Ochs (Bitbucket: [Cédric OCHS](https://bitbucket.org/Cédric OCHS), ).


Fixed: Crash under Linux (patch by Nimetu), fixes #252

ryzom-pipeline commented 8 years ago

Original comment by Cédric Ochs (Bitbucket: [Cédric OCHS](https://bitbucket.org/Cédric OCHS), ).


In fact, it works and I think loading time is even better :)