krakjoe / parallel

A succinct parallel concurrency API for PHP8
Other
1.47k stars 96 forks source link

Segfault when using references in thread #306

Closed realFlowControl closed 2 weeks ago

realFlowControl commented 4 months ago

Running code that uses a reference to an element in an array in a thread segfaults with the following backtrace:

    frame #0: 0x000000018f386a60 libsystem_kernel.dylib`__pthread_kill + 8
    frame #1: 0x000000018f3bec20 libsystem_pthread.dylib`pthread_kill + 288
    frame #2: 0x000000018f2cba30 libsystem_c.dylib`abort + 180
    frame #3: 0x000000010036c32c php`zend_mm_panic + 52
    frame #4: 0x000000010036e930 php`_efree_32 + 120
    frame #5: 0x00000001003ea2f0 php`ZEND_ASSIGN_REF_SPEC_CV_VAR_HANDLER + 288
    frame #6: 0x00000001003cdc38 php`execute_ex + 172
    frame #7: 0x000000010104d03c parallel.so`php_parallel_scheduler_run(runtime=<unavailable>, frame=0x0000000102413020) at scheduler.c:331:13 [opt]
    frame #8: 0x000000010104c62c parallel.so`php_parallel_thread(arg=0x000000010125f3c0) at scheduler.c:501:9 [opt]
    frame #9: 0x000000018f3bef94 libsystem_pthread.dylib`_pthread_start + 136

You can reproduce using https://github.com/realFlowControl/1brc/blob/main/calculateAverage.php, the opline points to line 71 (where the array is accessed). If I comment the entire if-block, it segfaults in another line.

It started with https://github.com/krakjoe/parallel/commit/32cd9c3776bcab820000fa6e0ebc75cae6205810 and the change that leads to this is this line:

https://github.com/krakjoe/parallel/blob/d8a01139227b41115e337c11f58901b77104ab60/src/copy.c#L191

Setting this back to 2 will fix the segfault, but lead to ASAN tests failing ;-)

I sadly have no idea what is going on :-(

Edit (2024-10-02): if OPcache is enabled, this works like a charm, without any issues. As soon as I disable OPcache, it crashes

hschimpf commented 3 months ago

Hi @realFlowControl!

Does this happen on any PHP version?

realFlowControl commented 3 months ago

So far this issue is in the develop branch and the changes that lead to that are not yet released (that's why i'm withholding new minor releases). So it should not be related to https://github.com/hschimpf/parallel-sdk/issues/26

realFlowControl commented 1 month ago

Interestingly another way to workaround in case no OPcache is around is to declare the $stations array static in

https://github.com/realFlowControl/1brc/blob/57cc8737e9f2593c907ce6e845e682fadc1ccf31/calculateAverage.php#L59

arnaud-lb commented 2 weeks ago

Smaller reproducer:

<?php

$run = function() {
    $a = [];
    for ($i = 0; $i < 100; $i++) {
        $b = &$a[$i];
    }
    return 0;
};

for ($i = 0; $i < 10; $i++) {
    $futures[$i] = \parallel\run($run, []);
}

for ($i = 0; $i < 10; $i++) {
    var_dump($futures[$i]->value());
}

What is happening is that an array literal is copied to $a in $a = [];, but its refcount is not incremented because the IS_TYPE_REFCOUNTED flag is not set. Multiple threads will have the same array literal in $a, with a refcount of 1, and will modify it, which breaks semantics and also causes corruption and crash.

A refcount of 2 prevented this by forcing the VM to separate the array before modifying it.

Reverting two lines here and here would fix the issue, but this is inconsistent with other code in copy.c.

Changing this line to Z_TYPE_FLAGS_P(slot) &= ~IS_TYPE_COLLECTABLE; (so the IS_TYPE_REFCOUNTED is not removed) would also fix the problem, but this may be unsafe as ADDREF/DELREF are not thread safe.

Edit: When OPcache is enabled, parallel relies on it to manage literals. In this case, literals have a refcount of 2.