Closed fgalan closed 3 years ago
On CI environment new crash and core generation at March 10th, 0:10. Backtrace:
(gdb) bt
#0 0x00007f6ec9653495 in raise () from /lib64/libc.so.6
#1 0x00007f6ec9654c75 in abort () from /lib64/libc.so.6
#2 0x00007f6ec964c60e in __assert_fail_base () from /lib64/libc.so.6
#3 0x00007f6ec964c6d0 in __assert_fail () from /lib64/libc.so.6
#4 0x00000000006d7511 in boost::mutex::unlock() ()
#5 0x00000000006d4f31 in mongo::ReplicaSetMonitor::get(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, bool) ()
#6 0x00000000006b3e4f in mongo::DBClientReplicaSet::_getMonitor() const ()
#7 0x00000000006b5637 in mongo::DBClientReplicaSet::checkMaster() ()
#8 0x00000000006bb985 in mongo::DBClientReplicaSet::query(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::Query, int, int, mongo::BSONObj const*, int, int) ()
#9 0x00000000005fcf74 in collectionQuery(mongo::DBClientBase*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, mongo::BSONObj const&, std::auto_ptr<mongo::DBClientCursor>*, std::basic_string<char, std::char_traits<char>, std::allocator<char> >*) ()
#10 0x0000000000602afe in mongoSubCacheRefresh(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#11 0x00000000005a7633 in subCacheRefresh() ()
#12 0x00000000005a7c31 in subCacheSync() ()
#13 0x00000000005a88bc in ?? ()
#14 0x00007f6ecae40aa1 in start_thread () from /lib64/libpthread.so.0
#15 0x00007f6ec9709bdd in clone () from /lib64/libc.so.6
Steps for not crash machine when review ContextBroker cores:
cp /var/cb_cores/CB_core_20190329_235935.tar.bz2 /tmp
cd /tmp
bunzip2 CB_core_20190329_235935.tar.bz2
tar xvf CB_core_20190329_235935.tar
gdb /usr/bin/contextBroker /tmp/core.4667
(gdb) bt
(gdb) quit
rm -f core.4667 contextBroker.log CB_core_20190329_235935.tar
Remember to delete the CORE already annotated from directory /var/cb_cores
.
On CI environment new crash and core generation at March 30th, 00:01. Backtrace:
(gdb) bt
#0 0x00007f97d8520a9b in memcpy () from /lib64/libc.so.6
#1 0x00007f97d8d621e6 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Rep::_M_clone(std::allocator<char> const&, unsigned long) () from /usr/lib64/libstdc++.so.6
#2 0x00007f97d8d6228c in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6
#3 0x00000000005adf82 in getSubscribeContextCollectionName(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#4 0x0000000000604bdf in mongoSubCacheRefresh(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#5 0x00000000005a80f3 in subCacheRefresh() ()
#6 0x00000000005a86f1 in subCacheSync() ()
#7 0x00000000005a937c in ?? ()
#8 0x00007f97d9cb6aa1 in start_thread () from /lib64/libpthread.so.0
#9 0x00007f97d857fbdd in clone () from /lib64/libc.so.6
On CI environment new crash and core generation at Apr 12 23:57. Backtrace:
(gdb) bt
#0 0x00007f97965a6b13 in memcpy () from /lib64/libc.so.6
#1 0x00007f9796de81e6 in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::_Rep::_M_clone(std::allocator<char> const&, unsigned long) () from /usr/lib64/libstdc++.so.6
#2 0x00007f9796de828c in std::basic_string<char, std::char_traits<char>, std::allocator<char> >::basic_string(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) () from /usr/lib64/libstdc++.so.6
#3 0x00000000005adf82 in getSubscribeContextCollectionName(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
()
#4 0x0000000000604bdf in mongoSubCacheRefresh(std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()
#5 0x00000000005a80f3 in subCacheRefresh() ()
#6 0x00000000005a86f1 in subCacheSync() ()
#7 0x00000000005a937c in ?? ()
#8 0x00007f9797d3caa1 in start_thread () from /lib64/libpthread.so.0
#9 0x00007f9796605c4d in clone () from /lib64/libc.so.6
On Integration environment at 2019-07-18 (release 2.2.0):
(gdb) bt
#0 0x00007fa1b41e30a8 in free_dfa_content () from /lib64/libc.so.6
#1 0x00007fa1b41e3141 in regfree () from /lib64/libc.so.6
#2 0x00000000005a51a9 in EntityInfo::release() ()
#3 0x00000000005a5319 in subCacheItemDestroy(CachedSubscription*) ()
#4 0x00000000005a73ae in subCacheDestroy() ()
#5 0x00000000005a75a7 in subCacheRefresh() ()
#6 0x00000000005a7c31 in subCacheSync() ()
#7 0x00000000005a88bc in ?? ()
#8 0x00007fa1b5947aa1 in start_thread () from /lib64/libpthread.so.0
#9 0x00007fa1b4210c4d in clone () from /lib64/libc.so.6
On Integrators environment at 2019-08-02 14:05:10 (release 2.2.0):
(gdb) bt
#0 0x00007f66100f4893 in pthread_setname_np () from /lib64/libpthread.so.0
#1 0x0000000000671a2f in ?? ()
#2 0x0000000000672072 in ?? ()
#3 0x00000000006725c6 in ?? ()
#4 0x0000000000673579 in ?? ()
#5 0x0000000000673e6d in ?? ()
#6 0x00007f66100ebaa1 in start_thread () from /lib64/libpthread.so.0
#7 0x00007f660e9b4c4d in clone () from /lib64/libc.so.6
It isn't sure the last one is about cache logic as the backtrace is not showing anything related with that, but ok to mention it in this issue.
Several crashes on Production environment at 2019-08-06, 15:30:00 to 15:32:00 (release 2.0.0)
(gdb) bt
#0 0x00007f8f0b7654f5 in raise () from /lib64/libc.so.6
#1 0x00007f8f0b766cd5 in abort () from /lib64/libc.so.6
#2 0x00007f8f0c01fa8d in __gnu_cxx::__verbose_terminate_handler() () from /usr/lib64/libstdc++.so.6
#3 0x00007f8f0c01dbe6 in ?? () from /usr/lib64/libstdc++.so.6
#4 0x00007f8f0c01dc13 in std::terminate() () from /usr/lib64/libstdc++.so.6
#5 0x00007f8f0c01dd32 in __cxa_throw () from /usr/lib64/libstdc++.so.6
#6 0x00000000007018f2 in mongo::uasserted(int, char const*) ()
#7 0x0000000000525d2a in mongo::BSONObjBuilder::append(mongo::StringData const&, mongo::BSONObj) ()
#8 0x0000000000628018 in ?? ()
#9 0x000000000062af2c in ?? ()
#10 0x000000000062d521 in ?? ()
#11 0x000000000062e40e in ?? ()
#12 0x000000000062f482 in ?? ()
#13 0x0000000000630b2d in ?? ()
#14 0x0000000000635f7c in processContextElement(Entity*, UpdateContextResponse*, ActionType, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ApiVersion, Ngsiv2Flavour) ()
#15 0x00000000005d6980 in mongoUpdateContext(UpdateContextRequest*, UpdateContextResponse*, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > const&, std::map<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::less<std::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::basic_string<char, std::char_traits<char>, std::allocator<char> > > > >&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, std::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, ApiVersion, Ngsiv2Flavour) ()
#16 0x00000000004b1bba in postUpdateContext(ConnectionInfo*, int, std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, ParseData*, Ngsiv2Flavour) ()
#17 0x00000000004be650 in postIndividualContextEntity(ConnectionInfo*, int, std::vector<std::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::basic_string<char, std::char_traits<char>, std::allocator<char> > > >&, ParseData*) ()
#18 0x0000000000514575 in ?? ()
#19 0x00000000005153b4 in orion::requestServe(ConnectionInfo*) ()
#20 0x0000000000510ae3 in ?? ()
#21 0x000000000067a949 in ?? ()
#22 0x000000000067b6e0 in ?? ()
#23 0x000000000067fea8 in ?? ()
#24 0x00007f8f0cf52aa1 in start_thread () from /lib64/libpthread.so.0
#25 0x00007f8f0b81bc4d in clone () from `/lib64/libc.so.6
Crash in Preproduction environment, only machine pre2-iot-core-fe-01, in Nov 27 around 13:00 (release 2.2.0)
(gdb) bt
The core file stored a copy of contexbroker.log as well. The last lines of this stored file are:
time=2019-11-27T11:59:55.045Z | lvl=INFO | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[449]:collectionUpdate | msg=Database Operation Successful (update: <{ _id: ObjectId('5af2ff14b86cf1ab1c0f0c00'), $or: [ { lastSuccess: { $lt: 1526316152 } }, { lastSuccess: { $exists: false } } ] }, { $set: { lastSuccess: 1526316152, lastSuccessCode: -1 } }>) time=2019-11-27T12:00:02.204Z | lvl=INFO | corr=7157a50a-110d-11ea-bf0d-fa163e0e6feb | trans=1570793672-629-00000884464 | from=81.45.17.97 | srv=demo | subsrv=/ | comp=Orion | op=logMsg.h[1844]:lmTransactionStart | msg=Starting transaction from 127.0.0.1:35254/v1/contextEntities/ time=2019-11-27T12:00:02.204Z | lvl=INFO | corr=7157a50a-110d-11ea-bf0d-fa163e0e6feb | trans=1570793672-629-00000884464 | from=81.45.17.97 | srv=demo | subsrv=/ | comp=Orion | op=rest.cpp[885]:servicePathSplit | msg=Service Path 0: '/' time=2019-11-27T12:00:02.206Z | lvl=INFO | corr=7157a50a-110d-11ea-bf0d-fa163e0e6feb | trans=1570793672-629-00000884464 | from=81.45.17.97 | srv=demo | subsrv=/ | comp=Orion | op=connectionOperations.cpp[177]:collectionRangedQuery | msg=Database Operation Successful (query: { query: { $or: [ { _id.id: /./ } ], _id.servicePath: { $in: [ null, /^/$/ ] } }, orderby: { creDate: 1 } }) time=2019-11-27T12:00:02.208Z | lvl=INFO | corr=7157a50a-110d-11ea-bf0d-fa163e0e6feb | trans=1570793672-629-00000884464 | from=81.45.17.97 | srv=demo | subsrv=/ | comp=Orion | op=connectionOperations.cpp[177]:collectionRangedQuery | msg=Database Operation Successful (query: { query: { $or: [ { contextRegistration.entities.id: /./ }, { contextRegistration.entities: { $in: [] } }, { contextRegistration.entities.id: { $in: [] } } ], expiration: { $gt: 1574856002 }, servicePath: { $in: [ null, /^/$/ ] } }, orderby: { _id: 1 } }) time=2019-11-27T12:00:02.209Z | lvl=INFO | corr=7157a50a-110d-11ea-bf0d-fa163e0e6feb | trans=1570793672-629-00000884464 | from=81.45.17.97 | srv=demo | subsrv=/ | comp=Orion | op=logMsg.h[1874]:lmTransactionEnd | msg=Transaction ended time=2019-11-27T12:00:55.085Z | lvl=INFO | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[701]:runCollectionCommand | msg=Database Operation Successful (command: { listDatabases: 1 }) time=2019-11-27T12:00:55.085Z | lvl=ERROR | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=safeMongo.cpp[315]:getField | msg=Runtime Error (field 'databases' is missing in BSONObj <{ ok: 0.0, errmsg: "interrupted at shutdown", code: 11600, codeName: "InterruptedAtShutdown" }> from caller getOrionDatabases:478) time=2019-11-27T12:00:59.814Z | lvl=INFO | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=contextBroker.cpp[904]:main | msg=Orion Context Broker is running time=2019-11-27T12:00:59.828Z | lvl=INFO | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=mongoConnectionPool.cpp[217]:mongoConnect | msg=Successful connection to database time=2019-11-27T12:00:59.828Z | lvl=INFO | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[760]:setWriteConcern | msg=Database Operation Successful (setWriteConcern: 0) time=2019-11-27T12:00:59.828Z | lvl=INFO | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[807]:getWriteConcern | msg=Database Operation Successful (getWriteConcern)
The last one should be solved by PR https://github.com/telefonicaid/fiware-orion/pull/3580.
Crash in Intgration environment, machine ext2-iot-core-fe-01, in Jan 14 around 13:28 CET (release 2.3.0)
context broker last logs: time=2020-01-14T12:19:26.306Z | lvl=INFO | corr=e961b79a-36c7-11ea-8b94-fa163ec74401 | trans=1577705464-717-00015861269 | from=10.0.0.24 | srv=urbo | subsrv=/distritotelefonica | comp=Orion | op=logMsg.h[1874]:lmTransactionEnd | msg=Transaction ended time=2020-01-14T12:19:26.306Z | lvl=INFO | corr=f95f21fa-36c7-11ea-9e75-fa163ec74401 | trans=1577705464-717-00015861996 | from=10.0.0.24 | srv=urbo | subsrv=/distritotelefonica | comp=Orion | op=logMsg.h[1874]:lmTransactionEnd | msg=Transaction ended time=2020-01-14T12:19:26.315Z | lvl=INFO | corr=fb84cb60-36c7-11ea-8a99-fa163ec74401 | trans=1577705464-717-00015862133 | from=10.0.0.35 | srv=sc_vlci | subsrv=/medioambiente_emt | comp=Orion | op=logMsg.h[1874]:lmTransactionEnd | msg=Transaction ended time=2020-01-14T12:19:26.315Z | lvl=INFO | corr=f95f47a2-36c7-11ea-b0e7-fa163ec74401 | trans=1577705464-717-00015861997 | from=10.0.0.24 | srv=urbo | subsrv=/distritotelefonica | comp=Orion | op=logMsg.h[1874]:lmTransactionEnd | msg=Transaction ended time=2020-01-14T12:19:26.338Z | lvl=INFO | corr=e9669fa8-36c7-11ea-8798-fa163ec74401 | trans=1577705464-717-00015861275 | from=10.0.0.24 | srv=urbo | subsrv=/distritotelefonica | comp=Orion | op=logMsg.h[1874]:lmTransactionEnd | msg=Transaction ended time=2020-01-14T12:19:26.350Z | lvl=INFO | corr=e9613c70-36c7-11ea-bb4b-fa163ec74401 | trans=1577705464-717-00015861268 | from=10.0.0.24 | srv=urbo | subsrv=/distritotelefonica | comp=Orion | op=logMsg.h[1874]:lmTransactionEnd | msg=Transaction ended time=2020-01-14T12:19:26.372Z | lvl=INFO | corr=fb8a97ca-36c7-11ea-ad74-fa163ec74401 | trans=1577705464-717-00015862140 | from=10.0.0.36 | srv=sc_vlci | subsrv=/medioambiente_emt | comp=Orion | op=logMsg.h[1874]:lmTransactionEnd | msg=Transaction ended
Core analysis:
Core was generated by `/usr/bin/contextBroker -port 1026 -logDir /var/log/contextBroker -logLevel DEBU'. Program terminated with signal 6, Aborted.
Missing separate debuginfos, use: debuginfo-install contextBroker-2.3.0_20191212172652-1.x86_64 (gdb) bt
Crash in Inrtgration environment, machine ext2-iot-core-fe-02, in Jan 14 around 13:21 CET (release 2.3.0):
context broker last logs: time=2020-01-14T12:19:48.469Z | lvl=INFO | corr=fe3e5984-36c7-11ea-87d8-fa163e0622ea | trans=1577705434-253-00015849687 | from=10.0.0.24 | srv=urbo | subsrv=/distritotelefonica | comp=Orion | op=logMsg.h[1874]:lmTransactionEnd | msg=Transaction ended time=2020-01-14T12:19:48.474Z | lvl=INFO | corr=c39156d8-36c7-11ea-8055-fa163e0622ea | trans=1577705434-253-00015847232 | from=10.0.0.35 | srv=sc_vlci | subsrv=/medioambiente_emt | comp=Orion | op=logMsg.h[1874]:lmTransactionEnd | msg=Transaction ended time=2020-01-14T12:19:48.474Z | lvl=INFO | corr=15800b10-36c8-11ea-b864-fa163e0622ea | trans=1577705434-253-00015850495 | from=10.0.0.24 | srv=urbo | subsrv=/distritotelefonica | comp=Orion | op=logMsg.h[1874]:lmTransactionEnd | msg=Transaction ended time=2020-01-14T12:19:48.476Z | lvl=INFO | corr=4e26362a-36c7-11ea-99a5-fa163e0622ea | trans=1577705434-253-00015843816 | from=10.0.0.35 | srv=sc_vlci | subsrv=/medioambiente_emt | comp=Orion | op=logMsg.h[1874]:lmTransactionEnd | msg=Transaction ended time=2020-01-14T12:19:48.477Z | lvl=INFO | corr=157f6ec6-36c8-11ea-9020-fa163e0622ea | trans=1577705434-253-00015850492 | from=10.0.0.24 | srv=urbo | subsrv=/distritotelefonica | comp=Orion | op=logMsg.h[1874]:lmTransactionEnd | msg=Transaction ended time=2020-01-14T12:19:48.484Z | lvl=INFO | corr=283f9ff4-36c8-11ea-a4bd-fa163e0622ea | trans=1577705434-253-00015851387 | from=10.0.0.24 | srv=urbo | subsrv=/distritotelefonica | comp=Orion | op=logMsg.h[1844]:lmTransactionStart | msg=Starting transaction from 10.0.0.11:44074/v1/updateContext time=2020-01-14T12:19:48.485Z | lvl=INFO | corr=283f9ff4-36c8-11ea-a4bd-fa163e0622ea | trans=1577705434-253-00015851387 | from=10.0.0.24 | srv=urbo | subsrv=/distritotelefonica | comp=Orion | op=rest.cpp[883]:servicePathSplit | msg=Service Path 0: '/distritotelefonica' time=2020-01-14T12:19:48.485Z | lvl=WARN | corr=283f9ff4-36c8-11ea-a4bd-fa163e0622ea | trans=1577705434-253-00015851387 | from=10.0.0.24 | srv=urbo | subsrv=/distritotelefonica | comp=Orion | op=AlarmManager.cpp[405]:badInput | msg=Raising alarm BadInput 10.0.0.11: service '/v1/updateContext' not found
Core analysis:
Program terminated with signal 11, Segmentation fault.
Missing separate debuginfos, use: debuginfo-install contextBroker-2.3.0_20191212172652-1.x86_64 (gdb) bt
Last report on this issue was more than a year ago (14 Jan 2020). Thus, we understand this bug (if any) no longer exists in recent Orion versions.
If we are wrong, a new fresh issue with new backtrace core report will be opened.
Version:
On CI environment crash and core generation at March 8th, 0:20. Backtrace:
Possibly related code references: