simlaudato / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

IllegalStateException in startRecovery #933

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
$ cat /scratch/yingyib/data/asterix_log8/test4_sensorium-33.log 
Aug 11, 2015 5:48:50 PM edu.uci.ics.hyracks.control.nc.NCDriver main
SEVERE: Setting uncaught exception handler 
edu.uci.ics.hyracks.api.lifecycle.LifeCycleComponentManager@17da89a0
Aug 11, 2015 5:48:50 PM edu.uci.ics.hyracks.control.nc.NodeControllerService 
start
INFO: Starting NodeControllerService
Aug 11, 2015 5:48:50 PM 
edu.uci.ics.asterix.hyracks.bootstrap.NCApplicationEntryPoint start
INFO: Starting Asterix node controller: test4_sensorium-33
java.lang.IllegalStateException
    at edu.uci.ics.asterix.transaction.management.service.logging.LogReader.next(LogReader.java:78)
    at edu.uci.ics.asterix.transaction.management.service.recovery.RecoveryManager.startRecovery(RecoveryManager.java:222)
    at edu.uci.ics.asterix.hyracks.bootstrap.NCApplicationEntryPoint.start(NCApplicationEntryPoint.java:118)
    at edu.uci.ics.hyracks.control.nc.NodeControllerService.startApplication(NodeControllerService.java:318)
    at edu.uci.ics.hyracks.control.nc.NodeControllerService.start(NodeControllerService.java:255)
    at edu.uci.ics.hyracks.control.nc.NCDriver.main(NCDriver.java:44)

With this exception in one NC,  my instance can never be usable....

Original issue reported on code.google.com by buyingyi@gmail.com on 12 Aug 2015 at 12:59

GoogleCodeExporter commented 8 years ago

Original comment by buyingyi@gmail.com on 12 Aug 2015 at 12:59

GoogleCodeExporter commented 8 years ago
The NC JVM exists on the machine after running into this exception.

Original comment by buyingyi@gmail.com on 12 Aug 2015 at 1:02

GoogleCodeExporter commented 8 years ago

Original comment by buyingyi@gmail.com on 12 Aug 2015 at 1:06

GoogleCodeExporter commented 8 years ago
Got a similar issue -- my entire cluster is unusable now because one NC cannot 
start well:

Aug 12, 2015 7:46:33 PM edu.uci.ics.hyracks.control.nc.NCDriver main
SEVERE: Setting uncaught exception handler 
edu.uci.ics.hyracks.api.lifecycle.LifeCycleComponentManager@c81739c
Aug 12, 2015 7:46:33 PM edu.uci.ics.hyracks.control.nc.NodeControllerService 
start
INFO: Starting NodeControllerService
Aug 12, 2015 7:46:33 PM 
edu.uci.ics.asterix.hyracks.bootstrap.NCApplicationEntryPoint start
INFO: Starting Asterix node controller: test_sensorium-27
java.lang.IllegalStateException: Failed to redo
    at edu.uci.ics.asterix.transaction.management.service.recovery.RecoveryManager.redo(RecoveryManager.java:738)
    at edu.uci.ics.asterix.transaction.management.service.recovery.RecoveryManager.startRecovery(RecoveryManager.java:318)
    at edu.uci.ics.asterix.hyracks.bootstrap.NCApplicationEntryPoint.start(NCApplicationEntryPoint.java:118)
    at edu.uci.ics.hyracks.control.nc.NodeControllerService.startApplication(NodeControllerService.java:318)
    at edu.uci.ics.hyracks.control.nc.NodeControllerService.start(NodeControllerService.java:255)
    at edu.uci.ics.hyracks.control.nc.NCDriver.main(NCDriver.java:44)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 2
    at edu.uci.ics.hyracks.storage.am.common.tuples.TypeAwareTupleWriter.getFieldSlotsBytes(TypeAwareTupleWriter.java:122)
    at edu.uci.ics.hyracks.storage.am.common.tuples.TypeAwareTupleWriter.bytesRequired(TypeAwareTupleWriter.java:36)
    at edu.uci.ics.hyracks.storage.am.lsm.btree.tuples.LSMBTreeTupleWriter.bytesRequired(LSMBTreeTupleWriter.java:39)
    at edu.uci.ics.hyracks.storage.am.btree.frames.BTreeNSMInteriorFrame.getBytesRequriedToWriteTuple(BTreeNSMInteriorFrame.java:59)
    at edu.uci.ics.hyracks.storage.am.btree.impls.BTree.upsert(BTree.java:327)
    at edu.uci.ics.hyracks.storage.am.btree.impls.BTree.access$500(BTree.java:70)
    at edu.uci.ics.hyracks.storage.am.btree.impls.BTree$BTreeAccessor.upsertIfConditionElseInsert(BTree.java:897)
    at edu.uci.ics.hyracks.storage.am.btree.impls.BTree$BTreeAccessor.upsert(BTree.java:890)
    at edu.uci.ics.hyracks.storage.am.lsm.btree.impls.LSMBTree.modify(LSMBTree.java:365)
    at edu.uci.ics.hyracks.storage.am.lsm.common.impls.LSMHarness.modify(LSMHarness.java:331)
    at edu.uci.ics.hyracks.storage.am.lsm.common.impls.LSMHarness.forceModify(LSMHarness.java:314)
    at edu.uci.ics.hyracks.storage.am.lsm.common.impls.LSMTreeIndexAccessor.forceDelete(LSMTreeIndexAccessor.java:159)
    at edu.uci.ics.asterix.transaction.management.service.recovery.RecoveryManager.redo(RecoveryManager.java:733)
    ... 5 more

Original comment by buyingyi@gmail.com on 13 Aug 2015 at 2:56

GoogleCodeExporter commented 8 years ago
@Yingyi,

Both of these issues are related to corrupted log records.

The problem was reported in issue902:
https://code.google.com/p/asterixdb/issues/detail?id=902&q=logs&colspec=ID%20Typ
e%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity 
<https://code.google.com/p/asterixdb/issues/detail?id=902&q=logs&colspec=ID%20Ty
pe%20Status%20Priority%20Milestone%20Owner%20Summary%20ETA%20Severity>

Ian has already started a fix for this.

P.S. for the unusable cluster issue, looks like we need data replication 
support to overcome the problem :-)

Original comment by hubail...@gmail.com on 13 Aug 2015 at 8:36

GoogleCodeExporter commented 8 years ago
Thanks, Murtadha!
IMO, even without replication,  a bit data loss in one dataset sounds MUCH 
better than losing the entire cluster...

Original comment by buyingyi@gmail.com on 13 Aug 2015 at 8:52

GoogleCodeExporter commented 8 years ago
Yes, Ian’s fix is going to do that. When a corrupted log is encountered, the 
rest of the logs will be skipped but the node will continue starting up.

Original comment by hubail...@gmail.com on 13 Aug 2015 at 8:56

GoogleCodeExporter commented 8 years ago
That's great!  Thanks!

Original comment by buyingyi@gmail.com on 13 Aug 2015 at 9:02