krakjoe / parallel

A succinct parallel concurrency API for PHP8
Other
1.46k stars 96 forks source link

Unusual thread death with no messages #76

Closed feamsr00 closed 5 months ago

feamsr00 commented 5 years ago

Hello,

Ok, so I've seen some odd error states show up when running some recent thread tests. Under certain conditions we were faced with a script that died in the middle of execution with no indication of what killed it, or for.

<?php

namespace JobTest;

use parallel\Runtime as Thread_Runtime;
use parallel\Channel as Thread_Channel;

use Closure;

const DEFAULT_CHANNEL_CAPACITY = 1024;
const THREAD_ENVIRONMENT_STUB = __DIR__.DIRECTORY_SEPARATOR.'ThreadBooter.php'; //empty files

class Job
{
    public function setChannel(Thread_Channel $oData)
    {
        $this->oData = $oData;
    }

    public function run()
    {
        return "Hello World";
    }
}

class Agent
{
    protected $hThreadRuntime = False;

    public function __construct()
    {
        if (!$this->hThreadRuntime)
        {
            $this->hThreadRuntime=new Thread_Runtime(THREAD_ENVIRONMENT_STUB);
        }
        $this->oData=new Thread_Channel(DEFAULT_CHANNEL_CAPACITY);
        $this->cExecute = function (object $oJob, $oData)
        {
            error_reporting(E_ALL);
            ini_set('error_log', __DIR__.DIRECTORY_SEPARATOR.'thread-engine.log');
            error_log("init");
            $oJob->setChannel($oData);
            return $oJob->run();
        };
    }

    public function run(Job $oJob)
    {
        $oResult = $this->hThreadRuntime->run($this->cExecute, [$oJob, $this->oData]);
        return $oResult;
    }
}

$oJob=new Job;
$oAgent = new Agent;
$hThreadFuture = $oAgent->run($oJob);
var_dump($hThreadFuture->value());
print "Not reachable";

This seems to fall over right inside ::run(). This blows the whole process away, not just the threads. Is Parallel swallowing engine errors? Is it possible for Parallel to full emit any engine error encountered? Is this related to #72 ?

PS Crazy thought, this isn't related to the object typehint for cExecute... is it?

dktapps commented 5 years ago

Killing the whole process would likely indicate a segfault or some other similar issue.

feamsr00 commented 5 years ago

Seems like you're right! The code above produced an Process exited. Exit code - 0xc0000005 [an Access Violation, but no dump] and the code below...

<<<main test.php>>>
<?php

namespace JobTest;

use parallel\Runtime as Thread_Runtime;
use parallel\Channel as Thread_Channel;

use Closure;

const DEFAULT_CHANNEL_CAPACITY = 1024;
const THREAD_ENVIRONMENT_STUB = __DIR__.DIRECTORY_SEPARATOR.'ThreadBooter.php';

require_once __DIR__.DIRECTORY_SEPARATOR.'ReciprocalPropertyTestClasses.php';

class Agent
{
    protected $hThreadRuntime = False;

    public function __construct()
    {
        if (!$this->hThreadRuntime)
        {
            $this->hThreadRuntime=new Thread_Runtime(THREAD_ENVIRONMENT_STUB);
        }
        $this->cExecute = function ($oObj)
        {
            error_reporting(E_ALL);
            ini_set('error_log', __DIR__.DIRECTORY_SEPARATOR.'thread-engine.log');
            error_log("init");
        };
    }

    public function run($oObj)
    {
        $oResult = $this->hThreadRuntime->run($this->cExecute, [$oObj]);
        return $oResult;
    }
}

$oObj1 = new \Obj1;
$oObj2 = new \Obj2;

$oObj1->set($oObj2);

$oAgent = new Agent;
print "Run";
$hThreadFuture = $oAgent->run($oObj1);
var_dump($hThreadFuture->value());
print "Not reachable";
<<<ReciprocalPropertyTestClasses.php>>>
<?php

class Obj1
{
    public function set(Obj2 $obj)
    {
        $obj->set($this);
        $this->obj = $obj;
    }
}

class Obj2
{
    public function set(Obj1 $obj)
    {
        $this->obj = $obj;
    }
}

produced a stack overflow exception:

***********************
*  EXCEPTION DETAILS  *
***********************

DetailID = 1
    Count:    1
    Exception #:  0XC00000FD
    Stack:        

Thread 0 - System ID 15484

Entry point   php!sapi_cli_single_write+a000 
Create time   7/20/2019 1:20:49 AM 
Time spent in user mode   0 Days 00:00:00.531 
Time spent in kernel mode   0 Days 00:00:00.359 
        php_parallel!get_module+0x26d0
        php_parallel!get_module+0x2765
        php_parallel!get_module+0x2765
        php_parallel!get_module+0x2765
        php_parallel!get_module+0x2765
        php_parallel!get_module+0x2765
<///snip///>
Thread 1 - System ID 16084

Entry point   ntdll!TppWorkerThread 
Create time   7/20/2019 1:20:49 AM 
Time spent in user mode   0 Days 00:00:00.015 
Time spent in kernel mode   0 Days 00:00:00.046 

Instruction Address

[0x7ff80b4430d4] ntdll!NtWaitForWorkViaWorkerFactory+14 
[0x7ff80b3f01f4] ntdll!TppWorkerThread+274 
[0x7ff80a147974] kernel32!BaseThreadInitThunk+14 
[0x7ff80b40a271] ntdll!RtlUserThreadStart+21 

Thread 2 - System ID 14464

Entry point   ntdll!TppWorkerThread 
Create time   7/20/2019 1:20:49 AM 
Time spent in user mode   0 Days 00:00:00.015 
Time spent in kernel mode   0 Days 00:00:00.000 

Instruction Address

[0x7ff80b4430d4] ntdll!NtWaitForWorkViaWorkerFactory+14 
[0x7ff80b3f01f4] ntdll!TppWorkerThread+274 
[0x7ff80a147974] kernel32!BaseThreadInitThunk+14 
[0x7ff80b40a271] ntdll!RtlUserThreadStart+21 

Thread 3 - System ID 14112

Entry point   ntdll!TppWorkerThread 
Create time   7/20/2019 1:20:49 AM 
Time spent in user mode   0 Days 00:00:00.000 
Time spent in kernel mode   0 Days 00:00:00.000 

Instruction Address

[0x7ff80b4430d4] ntdll!NtWaitForWorkViaWorkerFactory+14 
[0x7ff80b3f01f4] ntdll!TppWorkerThread+274 
[0x7ff80a147974] kernel32!BaseThreadInitThunk+14 
[0x7ff80b40a271] ntdll!RtlUserThreadStart+21 

Thread 4 - System ID 9396

Entry point   ucrtbase!thread_start 
Create time   7/20/2019 1:20:49 AM 
Time spent in user mode   0 Days 00:00:00.000 
Time spent in kernel mode   0 Days 00:00:00.015 

This thread is not fully resolved and may or may not be a problem. Further analysis of these threads may be required.

Instruction Address

[0x7ff80b4402b4] ntdll!NtWaitForMultipleObjects+14 
[0x7ff8080dcc7e] KERNELBASE!WaitForMultipleObjectsEx+fe 
[0x7ff8080dcb6e] KERNELBASE!WaitForMultipleObjects+e 
[0x7fffeb286757] pthreadVC2!ptw32_cancelable_wait+67 
[0x7fffeb288cf4] pthreadVC2!sem_timedwait+f4 
[0x7fffeb286b78] pthreadVC2!ptw32_cond_timedwait+208 
[0x7fffe59396cd] php_parallel!get_module+85bd 
[0x7fffe593b3e6] php_parallel!get_module+a2d6 
[0x7fffeb2883cc] pthreadVC2!ptw32_threadStart+17c 
[0x7ff8076b1ffa] ucrtbase!thread_start<unsigned int (__cdecl*)(void * __ptr64)>+3a 
[0x7ff80a147974] kernel32!BaseThreadInitThunk+14 
[0x7ff80b40a271] ntdll!RtlUserThreadStart+21 

Let me know if I should open a second issue.

I've not needed to collect stacks from php before so I've followed (with limited success) the steps here: https://bugs.php.net/bugs-generating-backtrace-win32.php. Whats the next step?

feamsr00 commented 5 years ago

I tried the preview version of windbg and I think I was able to tease out the trace for the access violation crash....

[0x0]   php7ts!zend_hash_find + 0xa   
[0x1]   php_parallel!get_module + 0x2079   
[0x2]   php_parallel!get_module + 0x9873   
[0x3]   php_parallel!get_module + 0x955f   
[0x4]   php_xdebug_2_7_2_7_3_vc15_x86_64 + 0x71f2   
[0x5]   php7ts!destroy_op_array + 0x7da   
[0x6]   php7ts!execute_ex + 0x5f   
[0x7]   php_pthreads!get_module + 0x42   
[0x8]   php_xdebug_2_7_2_7_3_vc15_x86_64 + 0x6a20   
[0x9]   php7ts!libiconv_set_relocation_prefix + 0x3136   
[0xa]   php7ts!execute_ex + 0x5f   
[0xb]   php_pthreads!get_module + 0x42   
[0xc]   php_xdebug_2_7_2_7_3_vc15_x86_64 + 0x6a20   
[0xd]   php7ts!zend_execute + 0x1a8   
[0xe]   php7ts!zend_execute_scripts + 0xb9   
[0xf]   php7ts!php_execute_script + 0x261   
[0x10]   php!sapi_cli_single_write + 0x1320   
[0x11]   php!sapi_cli_single_write + 0x2298   
[0x12]   php!sapi_cli_single_write + 0x9f88   
[0x13]   KERNEL32!BaseThreadInitThunk + 0x14   
[0x14]   ntdll!RtlUserThreadStart + 0x21   

I think I got a disassembly of faulting instruction... (x64aa)

   php7ts!zend_hash_find:
00007ffb`46b364a0 48896c2418         mov     qword ptr [rsp+18h], rbp
00007ffb`46b364a5 57                 push    rdi
00007ffb`46b364a6 4883ec20           sub     rsp, 20h
00007ffb`46b364aa 4c8b4208           mov     r8, qword ptr [rdx+8] ds:00000000`00000018=????????????????
00007ffb`46b364ae 488bfa             mov     rdi, rdx
00007ffb`46b364b1 4c8bc9             mov     r9, rcx
00007ffb`46b364b4 4d85c0             test    r8, r8
00007ffb`46b364b7 0f85cb000000       jne     php7ts!zend_hash_find+0xe8 (00007ffb`46b36588)

Is this helpful at all?

Tuckbros commented 5 years ago

According to the documentation, it may not be possible to pass objects as argument https://www.php.net/manual/en/parallel-runtime.run.php

krakjoe commented 4 years ago

I need a minimal, complete, simple, reproducing test case to investigate.

tux-rampage commented 4 years ago

I've got the same issue with amphp's parallel when using the parallel extension: Sudden SIGSEGV on parallel\Runtime::run(). I'm trying to build something to reproduce.

tux-rampage commented 4 years ago

This seems to be some complex edge case, maybe in combination with a specific extension and current state. I'm not quite sure how to build something to reproduce this behavior. I'll let you know if I can dig up something when I got some spare time. If someone finds a way in the meantime it would be great 👍

feamsr00 commented 4 years ago

I'll reinvestigate soon, I had to re-engineer my project to allow process models. In the meantime, has anyone with parallel installed managed to copy & paste the code I left in the 3rd comment at the top of this issue? It goes in 2 files and depends only on parallel...

Run that with the debug version of php and I see no reason why you shouldn't get the same trace (unless of course it is an environmental issue but I tried that on 2 workstations and a server and they all dumped...)

dotfry commented 4 years ago

Try extension with #139 PR, at current moment can't test your scripts. Also, didnt test it on windows

awderh commented 4 years ago

EDIT: Seems like the author already knew about this, see #64

I'll reinvestigate soon, I had to re-engineer my project to allow process models. In the meantime, has anyone with parallel installed managed to copy & paste the code I left in the 3rd comment at the top of this issue? It goes in 2 files and depends only on parallel...

Run that with the debug version of php and I see no reason why you shouldn't get the same trace (unless of course it is an environmental issue but I tried that on 2 workstations and a server and they all dumped...)

I like this test case a lot. I haven't investigated thoroughly, but I highly suspect it has to do with how objects are copied over. Pure speculation for now...

obj1 points to obj2 which has a pointer to obj1

i.e.,

obj1['obj']->obj2 and obj2['obj'] -> obj1

So when a pass-by-value is attempted, you hit a stack overflow if you naively clone objects. E.g., var_dump on obj1 and you'll see PHP knows what it's getting itself into.

I think what needs to happen is to use a reference-table/mapping when doing the copy.

E.g.,

int *Obj1Ptr = &Obj1;
Obj1New = (shallow clone) Obj1
InternalMap[Obj1Ptr] = &Obj1New;
Obj2New = (shallow clone) Obj2
if (InternalMap[&Obj2->Obj]) {
    Obj2New->Obj = InternalMap[&Obj2->Obj];
}

I know user land doesn't get to work with this stuff, but I'm not sure where there is handled in the extension.

I'm probably way over my head here, but I also don't see how #139 resolves this

krakjoe commented 4 years ago

This causes overflow, see #64

cmb69 commented 2 years ago

[0x1] php_parallel!get_module + 0x2079 […] [0x7] php_pthreads!get_module + 0x42

Running parallel and pthreads simultaneously might not be the best idea.

dotfry commented 2 years ago

[0x1] php_parallel!get_module + 0x2079 […] [0x7] php_pthreads!get_module + 0x42

Running parallel and pthreads simultaneously might not be the best idea.

they didnt confilict in internals if you dont start using them both.

krakjoe commented 5 months ago

Fixed in 67413b193ecda80a32335a76eb25454a4d54bb32