theLaborInVain / kdm-manager-api

The API used by https://kdm-manager.com and related Kingdom Death: Monster utilities.
Other
3 stars 0 forks source link

Production outage: seg fault kills mongodb #13

Closed toconnell closed 3 years ago

toconnell commented 4 years ago

Here's the fault:

2019-12-28T15:00:54.798-0600 I COMMAND  [conn12538] command kdm-manager.survivors command: find { find: "survivors", filter: { name: { $nin: [ "-", "test", "Test", "TEST", "unknown", "U
nknown", "UNKNOWN", "Anonymous", "anonymous" ] }, dead: { $exists: false } }, sort: { created_on: -1 }, limit: 1, lsid: { id: UUID("9a851783-5eee-4bfc-bbb5-dd5af22b6e6c") }, $db: "kdm-m
anager", $readPreference: { mode: "primaryPreferred" } } planSummary: COLLSCAN keysExamined:0 docsExamined:38444 hasSortStage:1 cursorExhausted:1 numYields:300 nreturned:1 reslen:1375 l
ocks:{ Global: { acquireCount: { r: 602 } }, Database: { acquireCount: { r: 301 } }, Collection: { acquireCount: { r: 301 } } } protocol:op_msg 121ms
2019-12-28T15:01:01.155-0600 I COMMAND  [conn12538] command kdm-manager.survivors command: group { group: { key: { name: 1 }, ns: "survivors", $reduce: function(o, p){p.count++}, cond: 
{ name: { $nin: [ "-", "test", "Test", "TEST", "unknown", "Unknown", "UNKNOWN", "Anonymous", "anonymous" ], $exists: true } }, initial: { count: 0 } }, lsid: { id: UUID("9a851783-5eee-4
bfc-bbb5-dd5af22b6e6c") }, $db: "kdm-manager", $readPreference: { mode: "primaryPreferred" } } planSummary: COLLSCAN keysExamined:0 docsExamined:38444 numYields:300 reslen:592237 locks:
{ Global: { acquireCount: { r: 602 } }, Database: { acquireCount: { r: 301 } }, Collection: { acquireCount: { r: 301 } } } protocol:op_msg 1711ms
2019-12-28T15:01:02.516-0600 I COMMAND  [conn12538] command kdm-manager.settlements command: group { group: { key: { name: 1 }, ns: "settlements", $reduce: function(o, p){p.count++}, co
nd: { name: { $nin: [ "-", "test", "Test", "TEST", "unknown", "Unknown", "UNKNOWN", "Anonymous", "anonymous" ], $exists: true } }, initial: { count: 0 } }, lsid: { id: UUID("9a851783-5e
ee-4bfc-bbb5-dd5af22b6e6c") }, $db: "kdm-manager", $readPreference: { mode: "primaryPreferred" } } planSummary: COLLSCAN keysExamined:0 docsExamined:6512 numYields:50 reslen:151247 lock
s:{ Global: { acquireCount: { r: 102 } }, Database: { acquireCount: { r: 51 } }, Collection: { acquireCount: { r: 51 } } } protocol:op_msg 329ms
2019-12-28T15:01:03.545-0600 F -        [js] Invalid access at address: 0
2019-12-28T15:01:03.661-0600 F -        [js] Got signal: 11 (Segmentation fault).

 0x558846e2a4ca 0x558846e2978e 0x558846e29ddc 0x558845f9f5e6 0x7f2d010a8890 0x5588464749b0 0x55884645ffd6 0x55884647a07b 0x55884647a74d 0x55884647b390 0x5588462758e0 0x558846275e32 0x55
88462761db 0x55884627641a 0x5588464612e5 0x558846468838 0x5588462c9468 0x55884643dc2c 0x55884644cffc 0x55884644d63c 0x558846228c3d 0x558846228d45 0x558846228e6a 0x558845ee30a6 0x558845f
1116f 0x558845f1157f 0x558845f0d1fc 0x7f2d0192866f 0x7f2d0109d6db 0x7f2d00dc688f
----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"558844DEF000","o":"203B4CA","s":"_ZN5mongo15printStackTraceERSo"},{"b":"558844DEF000","o":"203A78E"},{"b":"558844DEF000","o":"203ADDC"},{"b":"558844DEF000","o":"11B0
5E6"},{"b":"7F2D01096000","o":"12890"},{"b":"558844DEF000","o":"16859B0","s":"_ZN2js14TenuringTracer8traverseI8JSObjectEEvPPT_"},{"b":"558844DEF000","o":"1670FD6","s":"_ZN2js8frontend9ObjectBox5traceEP8JSTracer"},{"b":"558844DEF000","o":"168B07B","s":"_ZN2JS12AutoGCRooter8traceAllEP8JSTracer"},{"b":"558844DEF000","o":"168B74D","s":"_ZN2js2gc9GCRuntime11markRuntimeEP8JSTracerNS1_18TraceOrMarkRuntimeE"},{"b":"558844DEF000","o":"168C390","s":"_ZN2js7Nursery7collectEP9JSRuntimeN2JS8gcreason6ReasonEPN7mozilla6VectorIPNS_11ObjectGroupELm0ENS_17SystemAllocPolicyEEE"},{"b":"558844DEF000","o":"14868E0","s":"_ZN2js2gc9GCRuntime7gcCycleEbRNS_11SliceBudgetEN2JS8gcreason6ReasonE"},{"b":"558844DEF000","o":"1486E32","s":"_ZN2js2gc9GCRuntime7collectEbNS_11SliceBudgetEN2JS8gcreason6ReasonE"},{"b":"558844DEF000","o":"14871DB","s":"_ZN2js2gc9GCRuntime7startGCE18JSGCInvocationKindN2JS8gcreason6ReasonEl"},{"b":"558844DEF000","o":"148741A","s":"_ZN2js2gc9GCRuntime13gcIfRequestedEP9JSContext"},{"b":"558844DEF000","o":"16722E5","s":"_ZN2js2gc9GCRuntime23gcIfNeededPerAllocationEP9JSContext"},{"b":"558844DEF000","o":"1679838","s":"_ZN2js8AllocateI8JSScriptLNS_7AllowGCE1EEEPT_PNS_16ExclusiveContextE"},{"b":"558844DEF000","o":"14DA468","s":"_ZN8JSScript6CreateEPN2js16ExclusiveContextEN2JS6HandleIP8JSObjectEEbRKNS3_22ReadOnlyCompileOptionsES7_jj"},{"b":"558844DEF000","o":"164EC2C","s":"_ZN16BytecodeCompiler12createScriptEN2JS6HandleIP8JSObjectEEb"},{"b":"558844DEF000","o":"165DFFC","s":"_ZN16BytecodeCompiler13compileScriptEN2JS6HandleIP8JSObjectEENS1_IP8JSScriptEE"},{"b":"558844DEF000","o":"165E63C","s":"_ZN2js8frontend13CompileScriptEPNS_16ExclusiveContextEPNS_9LifoAllocEN2JS6HandleIP8JSObjectEENS6_IPNS_11ScopeObjectEEENS6_IP8JSScriptEERKNS5_22ReadOnlyCompileOptionsERNS5_18SourceBufferHolderEP8JSStringPNS_21SourceCompressionTaskEPPNS_18ScriptSourceObjectE"},{"b":"558844DEF000","o":"1439C3D"},{"b":"558844DEF000","o":"1439D45"},{"b":"558844DEF000","o":"1439E6A"},{"b":"558844DEF000","o":"10F40A6","s":"_ZN5mongo5mozjs14MozJSImplScope4execENS_10StringDataERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbbbi"},{"b":"558844DEF000","o":"112216F"},{"b":"558844DEF000","o":"112257F","s":"_ZN5mongo5mozjs15MozJSProxyScope10implThreadEPv"},{"b":"558844DEF000","o":"111E1FC","s":"_ZN4nspr6Thread13ThreadRoutineEPv"},{"b":"7F2D0186B000","o":"BD66F"},{"b":"7F2D01096000","o":"76DB"},{"b":"7F2D00CA5000","o":"12188F","s":"clone"}],"processInfo":{ "mongodbVersion" : "3.6.3", "gitVersion" : "9586e557d54ef70f9ca4b43c26892cd55257e1a5", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.15.0-1044-gcp", "version" : "#70-Ubuntu SMP Mon Sep 16 12:38:02 UTC 2019", "machine" : "x86_64" }, "somap" : [ { "b" : "558844DEF000", "elfType" : 3, "buildId" : "40A22A63C3F04AF7F9D3983994C20023104C5804" }, { "b" : "7FFCF698B000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "D2FB617CBC4BD6896EF93491AC9BB9D0622A57A5" }, { "b" : "7F2D03B2B000", "path" : "/usr/lib/x86_64-linux-gnu/libstemmer.so.0d", "elfType" : 3, "buildId" : "278CA72E21C11FF2E15A86B0B2C13A8922951702" }, { "b" : "7F2D0390E000", "path" : "/lib/x86_64-linux-gnu/libz.so.1", "elfType" : 3, "buildId" : "EF3E006DFE3132A41D4D4DC0E407D6EA658E11C4" }, { "b" : "7F2D03706000", "path" : "/usr/lib/x86_64-linux-gnu/libsnappy.so.1", "elfType" : 3, "buildId" : "55765D88D03CC928130D788F1C7E4BF8415AC7E3" }, { "b" : "7F2D0348C000", "path" : "/usr/lib/x86_64-linux-gnu/libyaml-cpp.so.0.5", "elfType" : 3, "buildId" : "BF65D47C8CD968E616F7D179F84A80CA71DB8249" }, { "b" : "7F2D03283000", "path" : "/usr/lib/x86_64-linux-gnu/libpcrecpp.so.0", "elfType" : 3, "buildId" : "089B8438CC1394E978E56C556C9CAE768BD2F18C" }, { "b" : "7F2D03002000", "path" : "/usr/lib/x86_64-linux-gnu/libboost_program_options.so.1.65.1", "elfType" : 3, "buildId" : "9F69F11220BB1FAAB0B73A2B6F4B0E81D9B901CE" }, { "b" : "7F2D02DE8000", "path" : "/usr/lib/x86_64-linux-gnu/libboost_filesystem.so.1.65.1", "elfType" : 3, "buildId" : "32B8421A0643426D9FB008005F5A86688065008B" }, { "b" : "7F2D02BE3000", "path" : "/usr/lib/x86_64-linux-gnu/libboost_system.so.1.65.1", "elfType" : 3, "buildId" : "4BA851D242F2DB710CB1817DE860CF97AE2F9714" }, { "b" : "7F2D02973000", "path" : "/usr/lib/x86_64-linux-gnu/libtcmalloc.so.4", "elfType" : 3, "buildId" : "572D5C17FBDA6B678DF653411F676819DE18CA6B" }, { "b" : "7F2D02758000", "path" : "/lib/x86_64-linux-gnu/libresolv.so.2", "elfType" : 3, "buildId" : "390E9CC4C215314B6D8ADE6D6E28F8518418039C" }, { "b" : "7F2D024CB000", "path" : "/usr/lib/x86_64-linux-gnu/libssl.so.1.1", "elfType" : 3, "buildId" : "439A262CC0127BA401707DEC7A28884D617550E0" }, { "b" : "7F2D02000000", "path" : "/usr/lib/x86_64-linux-gnu/libcrypto.so.1.1", "elfType" : 3, "buildId" : "CB6876717C83B0CC01C3C919B9B6E86D8554F546" }, { "b" : "7F2D01DF8000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "9826FBDF57ED7D6965131074CB3C08B1009C1CD8" }, { "b" : "7F2D01BF4000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "25AD56E902E23B490A9CCDB08A9744D89CB95BCC" }, { "b" : "7F2D0186B000", "path" : "/usr/lib/x86_64-linux-gnu/libstdc++.so.6", "elfType" : 3, "buildId" : "570BF32E8698FCE3BFACC4A8B010827F842D1DD6" }, { "b" : "7F2D014CD000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "A33761AB8FB485311B3C85BF4253099D7CABE653" }, { "b" : "7F2D012B5000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "41BDC55C07D5E5B1D8AB38E2C19B1F535855E084" }, { "b" : "7F2D01096000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "28C6AADE70B2D40D1F0F3D0A1A0CAD1AB816448F" }, { "b" : "7F2D00CA5000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "B417C0BA7CC5CF06D1D1BED6652CEDB9253C60D0" }, { "b" : "7F2D00A33000", "path" : "/lib/x86_64-linux-gnu/libpcre.so.3", "elfType" : 3, "buildId" : "5B3416BB188EAF3FA4B7530AAE6C1890B38B0372" }, { "b" : "7F2D00818000", "path" : "/usr/lib/x86_64-linux-gnu/libunwind.so.8", "elfType" : 3, "buildId" : "7995F03B59E1D6EB7968EEA5B8534910D4E8E8D6" }, { "b" : "7F2D03D7C000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "64DF1B961228382FE18684249ED800AB1DCEAAD4" }, { "b" : "7F2D005F2000", "path" : "/lib/x86_64-linux-gnu/liblzma.so.5", "elfType" : 3, "buildId" : "8FBCCA354D964860B9E6EB3736E9B7BC6177B417" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x3A) [0x558846e2a4ca]
 mongod(+0x203A78E) [0x558846e2978e]
 mongod(+0x203ADDC) [0x558846e29ddc]
 mongod(+0x11B05E6) [0x558845f9f5e6]
 libpthread.so.0(+0x12890) [0x7f2d010a8890]
 mongod(_ZN2js14TenuringTracer8traverseI8JSObjectEEvPPT_+0x0) [0x5588464749b0]
 mongod(_ZN2js8frontend9ObjectBox5traceEP8JSTracer+0x36) [0x55884645ffd6]
 mongod(_ZN2JS12AutoGCRooter8traceAllEP8JSTracer+0xDB) [0x55884647a07b]
 mongod(_ZN2js2gc9GCRuntime11markRuntimeEP8JSTracerNS1_18TraceOrMarkRuntimeE+0xAD) [0x55884647a74d]
 mongod(_ZN2js7Nursery7collectEP9JSRuntimeN2JS8gcreason6ReasonEPN7mozilla6VectorIPNS_11ObjectGroupELm0ENS_17SystemAllocPolicyEEE+0x290) [0x55884647b390]
 mongod(_ZN2js2gc9GCRuntime7gcCycleEbRNS_11SliceBudgetEN2JS8gcreason6ReasonE+0x90) [0x5588462758e0]
 mongod(_ZN2js2gc9GCRuntime7collectEbNS_11SliceBudgetEN2JS8gcreason6ReasonE+0x1D2) [0x558846275e32]
 mongod(_ZN2js2gc9GCRuntime7startGCE18JSGCInvocationKindN2JS8gcreason6ReasonEl+0x4B) [0x5588462761db]
 mongod(_ZN2js2gc9GCRuntime13gcIfRequestedEP9JSContext+0x8A) [0x55884627641a]
 mongod(_ZN2js2gc9GCRuntime23gcIfNeededPerAllocationEP9JSContext+0x145) [0x5588464612e5]
 mongod(_ZN2js8AllocateI8JSScriptLNS_7AllowGCE1EEEPT_PNS_16ExclusiveContextE+0x28) [0x558846468838]
 mongod(_ZN8JSScript6CreateEPN2js16ExclusiveContextEN2JS6HandleIP8JSObjectEEbRKNS3_22ReadOnlyCompileOptionsES7_jj+0x38) [0x5588462c9468]
 mongod(_ZN16BytecodeCompiler12createScriptEN2JS6HandleIP8JSObjectEEb+0x2C) [0x55884643dc2c]
 mongod(_ZN16BytecodeCompiler13compileScriptEN2JS6HandleIP8JSObjectEENS1_IP8JSScriptEE+0x9C) [0x55884644cffc]
 mongod(_ZN2js8frontend13CompileScriptEPNS_16ExclusiveContextEPNS_9LifoAllocEN2JS6HandleIP8JSObjectEENS6_IPNS_11ScopeObjectEEENS6_IP8JSScriptEERKNS5_22ReadOnlyCompileOptionsERNS5_18SourceBufferHolderEP8JSStringPNS_21SourceCompressionTaskEPPNS_18ScriptSourceObjectE+0x6C) [0x55884644d63c]
 mongod(+0x1439C3D) [0x558846228c3d]
 mongod(+0x1439D45) [0x558846228d45]
 mongod(+0x1439E6A) [0x558846228e6a]
 mongod(_ZN5mongo5mozjs14MozJSImplScope4execENS_10StringDataERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEEbbbi+0x116) [0x558845ee30a6]
 mongod(+0x112216F) [0x558845f1116f]
 mongod(_ZN5mongo5mozjs15MozJSProxyScope10implThreadEPv+0x14F) [0x558845f1157f]
 mongod(_ZN4nspr6Thread13ThreadRoutineEPv+0x1C) [0x558845f0d1fc]
 libstdc++.so.6(+0xBD66F) [0x7f2d0192866f]
 libpthread.so.0(+0x76DB) [0x7f2d0109d6db]
 libc.so.6(clone+0x3F) [0x7f2d00dc688f]
-----  END BACKTRACE  -----
toconnell commented 4 years ago

This issue was opened by a user: https://github.com/toconnell/kdm-manager/issues/539

toconnell commented 4 years ago

This failure was obscured by the spam from the 404's that gets corrected in the next release...we've got to get that one out and into prod ASAFP.

toconnell commented 4 years ago

Added automatic emailing on database failure:

@@ -122,6 +130,11 @@ def general_exception(exception):

     API.logger.warn('Flask caught an unhandled exception!')

+    # in the criminal justice system, database failure is especially heinous
+    if isinstance(exception, pymongo.errors.ServerSelectionTimeoutError):
+        API.logger.error('The database is unavailable!')
+        utils.email_exception(exception)
+
     if socket.getfqdn() != API.settings.get('server', 'prod_fqdn'):
         err = "'%s' is not production! Raising exception..." % socket.getfqdn()
         API.logger.warn(err)
toconnell commented 4 years ago

The query that killed the db was this:

find { find: "survivors", filter: { name: { $nin: [ "-", "test", "Test", "TEST", "unknown", "U
nknown", "UNKNOWN", "Anonymous", "anonymous" ] }, dead: { $exists: false } }, sort: { created_on: -1 }, limit: 1

...looks like the most recent survivor query, which has been running every few minutes for years.

Gotta confirm that though.

toconnell commented 3 years ago

Haven't had one of these in almost a year. Closing it out.