mongodb / mongo-php-driver

The Official MongoDB PHP driver
https://pecl.php.net/package/mongodb
Apache License 2.0
887 stars 202 forks source link

Crash on Windows with PHP7 (&7.1) #456

Closed bozhinov closed 4 years ago

bozhinov commented 7 years ago

Description

Apache crashed while I was migrating my existing code to PHP7 Dums here: http://bojinov.info/momchil/php_mongo/

Not using the glue from https://github.com/mongodb/mongo-php-library/commits/master The script is doing simple find to populate dataTables over Ajax

Environment

Apache 2.4.23 x64 VC14 from https://www.apachelounge.com/download/ PHP 7.0.13 x64 VC14 from main site (Initially tried with 7,1 RC6 - same result)

$ php -i | grep -E 'mongodb|libmongoc|libbson' mongodb version 1.1.9 mongodb stability stable libmongoc version 1.1.2 libbson version 1.1.3

Test Script

Can't isolate it for now. Will keep trying

Expected and Actual Behavior

https://bugs.php.net/bugs-generating-backtrace.php

Debug Log

http://php.net/manual/en/mongodb.configuration.php#ini.mongodb.debug

bozhinov commented 7 years ago

Since these first crashes, I managed to get it to crash several more times. The 1.2 alpha crashes right from the start

I have been waiting for months for stable versions of both PHP and the MongoDB ext.

jmikola commented 7 years ago

Apache crashed while I was migrating my existing code to PHP7. Dumps here: http://bojinov.info/momchil/php_mongo/

Those dumps appear to be binary files. What application are you able to use to view these? Are you able to supply a backtrace in text format?

Apache 2.4.23 x64 VC14 from https://www.apachelounge.com/download/ PHP 7.0.13 x64 VC14 from main site

I assume you downloaded PHP from http://windows.php.net/download/. Since there are both thread-safe and non-thread-safe DLLs for x64 (see: 1.1.9), did you verify that you're loading the correct version?

Have you attempted loading the DLL with php.exe to test if the extension can be loaded at all? With the extension loaded in your php.ini file, php.exe -m should report it as loaded, as well as phpinfo(). You can also test a script that executes phpversion('mongodb') or instantiates a MongoDB\Driver\Manager instance.

(Initially tried with 7,1 RC6 - same result)

PECL is not yet creating builds for PHP 7.1, so I'm not sure how you would have tested that unless you manually compiled the driver. I don't believe the PHP 7.0 DLLs are compatible.

bozhinov commented 7 years ago

Hi Jeremy

Using TS versions from php.net Extensions work well, but at some point they start crashing. I was in few hours of flawless work and then Bam! Connection reset!

It only started crashing when I started testing the GridFS So, when I said - Not using the glue - I lied. Sorry

Loaded the dumps in the WinDBG, but it failed to recognize the PDB file although I made sure I used the correct one for the ext I m using. (The one that comes in the pecl zip) With no symbols for php_mongodb.dll, the info is not going to help much. I would have attached it otherwise. The dumps should do fine though.

There are builds for 7.1 in the pecl snaps

Momchil

jmikola commented 7 years ago

Can you share a standalone PHP file (or a slice of an application with a composer.json file if multiple files are needed) that can be used to reproduce the crash? If this depends on existing data, a MongoDB dump file for the relevant collection(s) or a script to insert that data would help as well.

I can then use something like ab to issue a series of HTTP requests against the application in an attempt to trigger the crash. If that proves fruitful, I can then look into compiling the extension with debug symbols for local testing.

bozhinov commented 7 years ago

That's the thing, crashes are inconsistent. The same code will run for hours before it crashes Should clarify: That's a production server of a sort that has multiple instances of apache, running both php 5 & 7. No crashes on any of the other instances.

bozhinov commented 7 years ago

Few things I noticed. The more exceptions and code errors the faster the crashes occur.

example: trying to serialize some bson object (not on purpose) - first few times I get an error and then it will just start crashing. Even after I restart the apache the same script will crash again upon reload. Disabling the opcache fixes that and I start getting an error again

jmikola commented 7 years ago

@bozhinov: Are you able to create a backtrace using the second method ("without compiler, on Win32") outlined on https://bugs.php.net/bugs-generating-backtrace-win32.php? Even without symbols for the MongoDB extension, we should see its frames and anything in the PHP engine below it.

example: trying to serialize some bson object (not on purpose) - first few times I get an error and then it will just start crashing.

Can you be more descriptive about "an error" here? If it's an exception, can you dump its message and backtrace? Since "error" is vague, let's stick with more specific terms like "exception", "fatal error" (in the PHP sense), and "crash".

Even after I restart the apache the same script will crash again upon reload. Disabling the opcache fixes that and I start getting an error again

It's not clear to me what "opcache fixes that" is referring to. Without opcache, does the crash revert to a PHP exception?

Should clarify: That's a production server of a sort that has multiple instances of apache, running both php 5 & 7. No crashes on any of the other instances.

How is Apache interfacing with PHP? Are you using mod_proxy_fcgi and pointing to php-fpm servers for PHP 5 and 7? Is it possible that both versions of PHP are inadvertently sharing some state or cache on the host?

bozhinov commented 7 years ago

@jmikola Dumps generated as you requested. The details on how to collect it are in your e-mail box

Error = all the above. Stuff that occurs during the development cycle

How the opcache is related in all this is beyond my level of understanding. the crash reverts to regular php warnining saying that I can't use bison objects as an array

php in all instances is loaded as a module. I know I m using latest versions of both 5 & 7 where the bug with opcache not working if running both 5 & 7 is resolved

bozhinov commented 7 years ago

Just tested 1.2 - Same thing as with the Alpha versions - crash on first use but I guess that is issue #428

sergiorighi commented 7 years ago

On final 7.1.0 I have this error:

[02-Dec-2016 10:44:39 UTC] PHP Warning:  PHP Startup: mongodb: Unable to initialize module
Module compiled with module API=20151012
PHP    compiled with module API=20160303
These options need to match
 in Unknown on line 0
jmikola commented 7 years ago

@sergiorighi: Your comment is not related to @bozhinov's issue. Based on your error, you're attempting to use a DLL compiled for PHP 7.0 with 7.1, which will not work. PHP 7.1 DLLs should start appearing on the PECL website in the coming days, according to this tweet. The snapshots Anatol alluded to may be found here. Note that this those builds are all handled by PHP.net and we (the MongoDB team) do not create the Windows builds.

jmikola commented 7 years ago

85MB and 169MB dump files

@bozhinov: I was able to get the dump files opened in my Windows VM. Of the various dump files you provided, the three 85MB files do not have any reference to PHP, despite being httpd.exe crashes:

The other three dump files (169MB) appear to all have crashed due to a stack overflow serializing a PHP value to BSON:

Their abridged stack trace is:

php_mongodb!phongo_bson_typemap_to_state+19f2 
php_mongodb!phongo_zval_to_bson+421 
php_mongodb!phongo_bson_typemap_to_state+17d3 
php_mongodb!phongo_bson_typemap_to_state+1a05 
php_mongodb!phongo_zval_to_bson+421 
...
php_mongodb!phongo_bson_typemap_to_state+17d3 
php_mongodb!phongo_bson_typemap_to_state+1a05 
php_mongodb!phongo_zval_to_bson+421 
php_mongodb!phongo_bson_typemap_to_state+6408 
php7ts!php_hash_environment+1cb 
php7ts!execute_ex+44 
...
mod_http2+1931b 
ucrtbase!crt_at_quick_exit+7d 
kernel32!BaseThreadInitThunk+d 
ntdll!RtlUserThreadStart+1d 

It's possible that this is related to PHPC-848, which fixed two issues. The relevant fix for your crash would be fixing recursion protection for objects and associative arrays that contain circular references.

If that bug was the cause, it should be fixed for you in 1.1.10. Although the fix was also released in 1.2.1, the 1.2 series currently has incompatibility with threaded web servers that needs to be resolved before you can test that (the issue is PHPC-854, which I allude to later in this comment).


Crashes due to assert() failures

These crashes were due to assert() failures in libmongoc. Based on the PHP error log you attached, it appears to be due to an empty collection name:

ext\mongodb\src\libmongoc\src\mongoc\mongoc-cursor.c:688 _mongoc_cursor_collection(): precondition failed: *collection_len > 0

If this is reproducible, I'd be interested in seeing the PHP code that triggered the error. Ideally, the PHP driver should be throwing exceptions before we pass an invalid value on to libmongoc.


Other crashes you provided

The dump files in the "Crash on EDIT CASE" directory do mention the MongoDB driver and a stack overflow, but the dumps are also much smaller at 6MB and don't have any stack trace information.

I did not find any reference to the MongoDB driver in the dumps within the "crash for no reason" and "alfa3 driver crash" directories; however, if those dumps were created from any driver version since 1.2.0alpha2 (as at least one directory name would suggest), it's possible that you were hitting PHPC-854.

bozhinov commented 7 years ago

Thank you Jeremy

I will look into that empty collection name thing. I m also not sure how that would be possible, but hey, I don't mind testing it few hundred more times before releasing it to production

As for the crash in the mongo driver, I think I will wait for PHPC-854 to be resolved before I attempt it again Running 1.1.10 since yesterday and looking good, but then again. I barely touch the grid with the current version (I had to)

Thanks again, Momchil

bozhinov commented 7 years ago

It crashed again with 1.2.2

I will collect the dump as you have previuosly requested once the rest of the guys clear the office

One thing I noticed is that restarting the Apache did not help I had to restart the In-Memory instance of MongoDb. Which is where I store my sessions

bozhinov commented 7 years ago

Details on the newly generated dumps sent over e-mail

jmikola commented 7 years ago

There appear to be two unique crashes among the new dump files:

The larger 405MB dumps are all segfaults within the extension (attepting to ready a null pointer in bson_destroy()):

In httpdPID2252Date12_18_2016__Time_09_55_27AM600First chance exception 0XC0000005.dmp the assembly instruction at php_mongodb!bson_destroy+5b in C:\mongodb\php\ext\php_mongodb.dll from The PHP Group has caused an access violation exception (0xC0000005) when trying to read from memory location 0x00000000 on thread 84

php_mongodb!bson_destroy+5b 
php_mongodb!mongoc_cursor_new_from_command_reply+1025 
php_mongodb!mongoc_cursor_destroy+a3 
php_mongodb+355d 
php_mongodb+1e6f 
php_mongodb!get_module+10444 
php7ts!php_hash_environment+1cb 
php7ts!execute_ex+44
...

I assume the crash originates from bson_destroy() accessing some struct field. The second stack frame is mongoc_cursor_new_from_command_reply(). I'll see if I can get the driver's debug symbols working, which may reveal the arguments passed to each function; however, I'm not optimistic that will pinpoint the root cause of the crash.

Have you had no luck reproducing these crashes on demand with PHP scripts?

Also, my analysis is done by opening the files in Microsoft's DebugDiag tool. It would likely save time if you were to use that yourself and extract the backtraces for future crashes. That would at least let you rule out the crashes that are unrelated to the extension or PHP.


The smaller 80MB dumps are don't relate to the extension. I don't have any input to provide here, but the exception and backtrace is below:

In httpdPID4944Date12_18_2016__Time_09_55_36AM414First chance exception 0XC0000008.dmp the assembly instruction at ntdll!KiRaiseUserExceptionDispatcher+3a in C:\Windows\System32\ntdll.dll from Microsoft Corporation has caused an unknown exception (0xc0000008) on thread 0

ntdll!KiRaiseUserExceptionDispatcher+3a 
KERNELBASE!CloseHandle+13 
kernel32!CloseHandleImplementation+3d 
libapr_1!apr_shm_size_get+27d 
libapr_1!apr_shm_detach+12 
mod_socache_shmcb+162c 
mod_ssl!ssl_run_proxy_post_handshake+8db2 
mod_ssl!ssl_run_pre_handshake+3db7 
libapr_1!apr_pool_clear+6e 
httpd!OPENSSL_Applink+d0c 
httpd!OPENSSL_Applink+1ff9 
kernel32!BaseThreadInitThunk+d 
ntdll!RtlUserThreadStart+1d 
bozhinov commented 7 years ago

Hi,

I believe it is a threshold I m hitting All i do to generate it is click around and it will eventually crash regarding of the page I m on It will crash on a page and then load it OK the next time I try it

I have both DebugDiag and WinDbg and I have the source code I need those symbols working to be of any use though

I can filter out the non mongo related crashes to save you some time

jmikola commented 7 years ago

I need those symbols working to be of any use though

Debug symbols for the extension are included in the PECL downloads (i.e. php_mongodb.pdb). As for PHP, those should be available in the debug packs on http://windows.php.net/download/. When I opened these dumps, DebugDiag said it did not find symbols for everything and I didn't see any UI element to allow me to load symbols into the analysis. I'm not sure if it was picking up my symbols for PHP and the extension at all, which might be why I was only seeing function names in the backtrace (just as I do for the unrelated dumps due to Apache crashes, where I certainly don't have symbols).

bozhinov commented 7 years ago

My bad. I did copy all of the PHP debug pack to the symbol folder, but I forgot the one for the mongo ext I spent some time looking at the assembly around the crash and I came across this one php_mongodb!mongoc_apm_set_server_heartbeat_failed_cb Once it got there it started destroying objects No documentation on that one though. I would assume it is part of the core lib that this ext is based on

Anyway, let me know how do I help

jmikola commented 7 years ago

Do you have a full backtrace that includes mongoc_apm_set_server_heartbeat_failed_cb()? I can't imagine how that would be called, as there are no references to it in the extension. It is part of libmongoc's API, but is also not called internally and only exposed as a setter for users (either directly or wrapping drivers such as ourselves). The actual feature to integrate Application Performance Monitoring (APM) is PHPC-349.

To recap, there are two outstanding crashes on my radar in this ticket:

I'll reiterate that we're unlikely to get anywhere unless these issues can be reproduced in some way. At the very least, some script or snapshot of an application where we can slam it with requests via ab and trigger one crash among X requests would be worthwhile.

bozhinov commented 7 years ago

I will try and create some sample application

bozhinov commented 7 years ago

I have managed to reproduce is in an isolated matter. Test script will arrive over e-mail All you need to do is refresh the page about 100 times and it will crash

jmikola commented 7 years ago

The current crash appears to be a stack overflow while serializing a PHP value passed as an argument to MongoDB\Driver\BulkWrite::update(). Here is a snippet of the backtrace from the point before it enters the MongoDB driver:

php_mongodb.dll!object_to_bson(_zval_struct * object=0x00000000084ffca0, php_phongo_bson_flags_t flags=PHONGO_BSON_NONE, const char * key=0x0000000003045680, long key_len=4, _bson_t * bson=0x00000000003cdbe0) Line 1188
php_mongodb.dll!phongo_bson_append(_bson_t * bson=0x00000000003cdbe0, php_phongo_bson_flags_t flags=PHONGO_BSON_NONE, const char * key=0x0000000003045680, long key_len=4, _zval_struct * entry=0x00000000084ffca0) Line 1257
php_mongodb.dll!phongo_zval_to_bson(_zval_struct * data=0x0000000008414260, php_phongo_bson_flags_t flags=PHONGO_BSON_NONE, _bson_t * bson=0x00000000003cdbe0, _bson_t * * bson_out=0x0000000000000000) Line 1418
php_mongodb.dll!zim_BulkWrite_update(_zend_execute_data * execute_data, _zval_struct * return_value=0x0000000000000000) Line 263
php7ts.dll!ZEND_DO_FCALL_SPEC_RETVAL_UNUSED_HANDLER(_zend_execute_data * execute_data=0x0000000008414150) Line 984
php7ts.dll!execute_ex(_zend_execute_data * ex) Line 432
php7ts.dll!zend_call_function(_zend_fcall_info * fci=0x0000000007f0e610, _zend_fcall_info_cache * fci_cache=0x0000000007f0e530) Line 829
php7ts.dll!_call_user_function_ex(_zval_struct * object, _zval_struct * function_name, _zval_struct * retval_ptr, unsigned int param_count, _zval_struct * params=0x0000000007f0e6a0, int no_separation=1) Line 673
php7ts.dll!ps_write_user(void * * mod_data=0x0000000008006558, _zend_string * key, _zend_string * val, __int64 maxlifetime=133228512) Line 167
php7ts.dll!php_session_save_current_state(int write) Line 470
php7ts.dll!zif_session_write_close(_zend_execute_data * execute_data=0x00000000029ed5b0, _zval_struct * return_value=0x0000000007f0e7d9) Line 2282
php7ts.dll!zend_call_function(_zend_fcall_info * fci=0x0000000007f0e860, _zend_fcall_info_cache * fci_cache=0x0000000007f0e780) Line 846
php7ts.dll!_call_user_function_ex(_zval_struct * object, _zval_struct * function_name, _zval_struct * retval_ptr, unsigned int param_count, _zval_struct * params=0x00000000084670a0, int no_separation=1) Line 673
php7ts.dll!user_shutdown_function_call(_zval_struct * zv) Line 4978
php7ts.dll!zend_hash_apply(_zend_array * ht=0x000000000845a850, int(*)(_zval_struct *) apply_func=0x000007fee53e6310) Line 1508
php7ts.dll!php_call_shutdown_functions() Line 5060
php7ts.dll!php_request_shutdown(void * dummy=0x00000000003f88a0) Line 1820

Based on the backtrace, this is related to one of the updates issued in your session handler (perhaps MongoSession::write()). The line numbers in the trace appear to be off by a few, but I was able to cross-reference with the following lines in the 1.2.3 source:

Inspecting memory with the debugger, the BSON data passed through contains a $set key, so it does appear to be the second argument of MongoDB\Driver\BulkWrite::update(). Looking further along the call stack, the BSON keys always appear to be "0" (based on the key and key_len arguments passed to object_to_bson()). I see no evidence of other keys that we'd expect to find within the $set value (e.g. data, active, expiry), but it's possible that we enter the rabbit hole before getting to those fields. Also of note is that php_phongo_is_array_or_document() reports that the argument to phongo_bson_append() is an object instead of an array (see: here). This hints that the array has a mix of numeric and string keys, which suggests that the following code in MongoSession.php may be related:

// check for existing session for merge
if (!empty($this->_session)) {
    $new_obj = array_merge((array)$this->_session, $new_obj);
}

We can correlate this with the read() function, which appears to be the only thing that assigns a value to $this->_session:

$result = $this->_mongo->findOne(['session_id'  => $id, 'expiry' => ['$gte' => time()], 'active' => 1]);

if (isset($result[0]['data'])) {
    $this->_session = $result;
    return $result[0]['data'];
}

findOne() returns a single document, so I don't understand why $result[0]['data'] would be accessed unless your document actually contains a top-level "0" field. In any event, this doesn't agree with the array_merge() in your write() method, as you're merging what looks like a value within $result[0] into the top-level $result document.

If you can debug the PHP value of $new_obj in write() before you call MongoDB\Collection::updateOne(), that might shed some light on this.


I'm not sure if this is related, but I also spotted the following being printed intermittently in the Apache logs:

Tue Feb 7 11:33:00 2017 (2972): Fatal Error Base address marks unusable memory region. Please setup opcache.file_cache and opcache.file_cache_fallback directives for more convenient Opcache usage

bozhinov commented 7 years ago

Weirdly enough the session code has been in production for few months now and I have not observed any crashes It is only when I start using the grid (RainTpl) when the crashes start Having said that, removing the session code helps with the crashes

Plus it is doing the same thing 100+ times and then it starts crashing. I mean, how could it be my code ?

As for the FindOne, I figured it was my implementation of it – look at class.core.mongo.php I don’t mind it. I use it in several other places and it works ok. It generates 0 errors, warnings or notices

ooglek commented 7 years ago

Having a similar issue -- not sure what the deal is, or if it is related. Had the same problem on 1.1.9 as 1.2.5 (which I'm now using and getting this core dump).

#0  0x000000080965c558 in bson_destroy (bson=0x80d07b990) at /tmp/pear/mongodb/src/libbson/src/bson/bson.c:2181
2181    /tmp/pear/mongodb/src/libbson/src/bson/bson.c: No such file or directory.
        in /tmp/pear/mongodb/src/libbson/src/bson/bson.c
[New Thread 802806400 (LWP 100131/<unknown>)]
(gdb) bt
#0  0x000000080965c558 in bson_destroy (bson=0x80d07b990) at /tmp/pear/mongodb/src/libbson/src/bson/bson.c:2181
#1  0x0000000809684145 in _mongoc_cursor_destroy (cursor=0x80d07b900) at /tmp/pear/mongodb/src/libmongoc/src/mongoc/mongoc-cursor.c:522
#2  0x0000000809683fbd in mongoc_cursor_destroy (cursor=<value optimized out>) at /tmp/pear/mongodb/src/libmongoc/src/mongoc/mongoc-cursor.c:479
#3  0x00000008096afda5 in phongo_execute_query (manager=<value optimized out>, namespace=<value optimized out>, zquery=<value optimized out>, zreadPreference=<value optimized out>, server_id=<value optimized out>, return_value=<value optimized out>, return_value_used=<value optimized out>)
    at /tmp/pear/mongodb/php_phongo.c:565
#4  0x00000008096c0ede in zim_Server_executeQuery (ht=<value optimized out>, return_value=<value optimized out>, return_value_ptr=<value optimized out>, this_ptr=<value optimized out>, return_value_used=<value optimized out>) at /tmp/pear/mongodb/src/MongoDB/Server.c:104
#5  0x00000000005f13b4 in zend_do_fcall_common_helper_SPEC ()
#6  0x00000000005b2538 in execute_ex ()
#7  0x000000000058a4d3 in zend_execute_scripts ()
#8  0x0000000000531ee3 in php_execute_script ()
#9  0x000000000060d8f3 in do_cli ()
#10 0x000000000060c74f in main ()
ooglek commented 7 years ago

See #536 Not sure if your issue is related. Core dump when passing a projection with an empty string as one of they keys.

khadgarmage commented 6 years ago

@jmikola I found that this problem has not been solved in 1.4.4. _mongoc_cursor_collection(): precondition failed

jmikola commented 6 years ago

@khadgarmage: Is there more to the assertion failure message? The OP's original error, which I copied in https://github.com/mongodb/mongo-php-driver/issues/456#issuecomment-266084561, included "*collection_len > 0".

The last information we received for this issue was not sufficient to indicate the root cause of the error. If you're able to generate backtraces for the related crash you're seeing, that would be helpful. I'd also request reporting a new issue and simply linking back to this issue in your description for added context.

jmikola commented 4 years ago

Closing due to inactivity. We reached a dead end attempting to investigate the original issue.