#. Observed Incorrect behavior from Till
This is the complete NC log that I get:
Jul 15, 2014 11:29:21 AM edu.uci.ics.hyracks.control.nc.NCDriver main
SEVERE: Setting uncaught exception handler
edu.uci.ics.hyracks.api.lifecycle.LifeCycleComponentManager@60ee8f91
Jul 15, 2014 11:29:21 AM edu.uci.ics.hyracks.control.nc.NodeControllerService
start
INFO: Starting NodeControllerService
Jul 15, 2014 11:29:21 AM
edu.uci.ics.asterix.hyracks.bootstrap.NCApplicationEntryPoint start
INFO: Starting Asterix node controller TAKE NOTE: asterix_node1
Jul 15, 2014 11:29:21 AM
edu.uci.ics.asterix.transaction.management.service.logging.LogManager
initializeLogAnchor
INFO: log file Id: 1, offset: 0
Jul 15, 2014 11:29:21 AM
edu.uci.ics.asterix.transaction.management.service.logging.LogManager
initializeLogManager
INFO: LogManager starts logging in LSN: 2147483648
Jul 15, 2014 11:29:21 AM
edu.uci.ics.asterix.hyracks.bootstrap.NCApplicationEntryPoint start
INFO: System is in a state: HEALTHY
Jul 15, 2014 11:29:21 AM
edu.uci.ics.asterix.transaction.management.resource.PersistentLocalResourceRepos
itory initialize
INFO: Initializing local resource repository ...
edu.uci.ics.hyracks.api.exceptions.HyracksDataException:
java.io.FileNotFoundException:
/Users/tillw/code/asterix/asterixdb2/asterix-installer/target/asterix-installer-
0.8.7-SNAPSHOT-binary-assembly/clusters/local/working_dir/asterix_root_metadata/
asterix_node1_iodevice0/.asterix_root_metadata (No such file or directory)
at edu.uci.ics.asterix.transaction.management.resource.PersistentLocalResourceRepository.readLocalResource(PersistentLocalResourceRepository.java:305)
at edu.uci.ics.asterix.transaction.management.resource.PersistentLocalResourceRepository.initialize(PersistentLocalResourceRepository.java:135)
at edu.uci.ics.asterix.hyracks.bootstrap.NCApplicationEntryPoint.start(NCApplicationEntryPoint.java:87)
at edu.uci.ics.hyracks.control.nc.NodeControllerService.startApplication(NodeControllerService.java:314)
at edu.uci.ics.hyracks.control.nc.NodeControllerService.start(NodeControllerService.java:257)
at edu.uci.ics.hyracks.control.nc.NCDriver.main(NCDriver.java:44)
Caused by: java.io.FileNotFoundException:
/Users/tillw/code/asterix/asterixdb2/asterix-installer/target/asterix-installer-
0.8.7-SNAPSHOT-binary-assembly/clusters/local/working_dir/asterix_root_metadata/
asterix_node1_iodevice0/.asterix_root_metadata (No such file or directory)
at java.io.FileInputStream.open(Native Method)
at java.io.FileInputStream.<init>(FileInputStream.java:146)
at edu.uci.ics.asterix.transaction.management.resource.PersistentLocalResourceRepository.readLocalResource(PersistentLocalResourceRepository.java:300)
... 5 more
I think that the problem comes in in NCApplicationEntryPoint.start(...). There
the recovery manager reports that the system state is not NEW_UNIVERSE, so we
initialize the localResourceRepository saying that it's not a new universe.
However, the files that are expected to be there for the initialization are not
available. So it seems that the actual meaning of the system state NEW_UNIVERSE
guarantees less than we expect.
#. How does this situation occur?
The following explains how the situation can happen.
------------------------------
When an asterix instance starts for the first time (meaning system state is
NEW_UNIVERSE), the following steps (pertaining to recovery, checkpoint, and
persistent local resource repository) are executed in
NodeControllerService.start() method.
1. RecoveryMananger checks whether the system state is NEW_UNIVERSE or not.
Since it is the first bootstrapping, there is no checkpoint file created yet,
so it is considered NEW_UNIVERSE. (The NEW_UNIVERSE state is determined by the
fact that whether a checkpoint file exists or not)
2. Since the system state is NEW_UNIVERSE, the recovery manager creates the
first checkpoint.
(Step 1 and 2 are executed in NCApplicationEntryPoint.start() method.)
3. The node where the recovery manager created the first checkpoint file is
registered to CC.
4. The persistent local resource repository is initialized. (where the
“.asterix_root_metadata” file is created.)
5. The metadata boot strapping(i.e., creates metadata dataverse since the
system state is NEW_UNIVERSE) is executed if the node is the metadata node.
6. MetadataBootStrap.startDDLRecovery() is called. This method take care of any
incomplete ddl operations.
(Step 4, 5, and 6 are executed in
NCApplicationEntryPoint.notifyStartupComplete())
It is possible that the system may crash after step2 and before step4 is
completed. If this situation occurs, the log that you showed can be created.
(Once the system succeeds the first bootstrapping, the situation will not occur)
Original issue reported on code.google.com by kiss...@gmail.com on 23 Jul 2014 at 5:40
Original issue reported on code.google.com by
kiss...@gmail.com
on 23 Jul 2014 at 5:40