namhnguyen / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

Querying Metadata returns null [ ] #830

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Hi All,

I ran a query to get results for Metadata datasets from Web UI

for $l in dataset Metadata.Dataset
return $l

The above query returned null on the Web UI

null []

Steps that I followed

 711  cd uci/asterixdb/
 712  export MANAGIX_HOME="/home/khurram/uci/asterixdb/asterix-installer/target/asterix-installer-0.8.7-SNAPSHOT-binary-assembly"
 713  export PATH=$PATH:$MANAGIX_HOME/bin
 714  export JAVA_HOME="/usr/lib/jvm/java-7-openjdk-amd64"
 715  export PATH=$JAVA_HOME/bin:$PATH
 716  find . -name "local.xml"
 717  vi ./asterix-installer/target/asterix-installer-0.8.7-SNAPSHOT-binary-assembly/clusters/local/local.xml
 718  ./asterix-installer/target/asterix-installer-0.8.7-SNAPSHOT-binary-assembly/bin/managix create -n inst1 -c ./asterix-installer/target/asterix-installer-0.8.7-SNAPSHOT-binary-assembly/clusters/local/local.xml
 719  ./asterix-installer/target/asterix-installer-0.8.7-SNAPSHOT-binary-assembly/bin/managix configure
 720  ./asterix-installer/target/asterix-installer-0.8.7-SNAPSHOT-binary-assembly/bin/managix validate
 721  ./asterix-installer/target/asterix-installer-0.8.7-SNAPSHOT-binary-assembly/bin/managix create -n inst1 -c ./asterix-installer/target/asterix-installer-0.8.7-SNAPSHOT-binary-assembly/clusters/local/local.xml

Content from my local.xml file

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cluster xmlns="cluster">
    <java_home>/usr/lib/jvm/java-7-openjdk-amd64/jre</java_home>
    <log_dir>/home/khurram/asterixdb_logs/logs</log_dir>
    <txn_log_dir>/home/khurram/asterixdb_logs/txnLogs</txn_log_dir>
    <store>storage</store>
    <iodevices>/home/khurram/asterix_storage</iodevices>
    <working_dir>
        <dir>/home/khurram/uci/asterixdb/asterix-installer/target/asterix-installer-0.8.7-SNAPSHOT-binary-assembly/clusters/local/working_dir</dir>
        <NFS>true</NFS>
    </working_dir>
    <master_node>
        <id>master</id>
        <client_ip>127.0.0.1</client_ip>
        <cluster_ip>127.0.0.1</cluster_ip>
        <client_port>1098</client_port>
        <cluster_port>1099</cluster_port>
        <http_port>8888</http_port>
    </master_node>
    <node>
        <id>node1</id>
        <cluster_ip>127.0.0.1</cluster_ip>
    </node>
</cluster>

From the cc.log

Nov 25, 2014 6:20:39 PM 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.HeuristicOptimizer optimize
INFO: Optimized Plan:
distribute result [%0->$$0]
-- DISTRIBUTE_RESULT  |PARTITIONED|
  exchange
  -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
    project ([$$0])
    -- STREAM_PROJECT  |PARTITIONED|
      exchange
      -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
        data-scan []<-[$$2, $$3, $$0] <- Metadata:Dataset
        -- DATASOURCE_SCAN  |PARTITIONED|
          exchange
          -- ONE_TO_ONE_EXCHANGE  |PARTITIONED|
            empty-tuple-source
            -- EMPTY_TUPLE_SOURCE  |PARTITIONED|

Nov 25, 2014 6:20:39 PM edu.uci.ics.asterix.om.util.AsterixClusterProperties 
getIODevices
WARNING: Configuration parameters for nodeId inst1_node1 not found. The node 
has not joined yet or has left.
Nov 25, 2014 6:20:39 PM edu.uci.ics.asterix.om.util.AsterixClusterProperties 
getIODevices
WARNING: Configuration parameters for nodeId inst1_node1 not found. The node 
has not joined yet or has left.
edu.uci.ics.hyracks.algebricks.common.exceptions.AlgebricksException: 
java.lang.NullPointerException
        at edu.uci.ics.asterix.metadata.declared.AqlMetadataProvider.buildBtreeRuntime(AqlMetadataProvider.java:666)
        at edu.uci.ics.asterix.metadata.declared.AqlMetadataProvider.buildInternalDatasetScan(AqlMetadataProvider.java:415)
        at edu.uci.ics.asterix.metadata.declared.AqlMetadataProvider.getScannerRuntime(AqlMetadataProvider.java:307)
        at edu.uci.ics.hyracks.algebricks.core.algebra.operators.physical.DataSourceScanPOperator.contributeRuntimeOperator(DataSourceScanPOperator.java:83)
        at edu.uci.ics.hyracks.algebricks.core.algebra.operators.logical.AbstractLogicalOperator.contributeRuntimeOperator(AbstractLogicalOperator.java:158)
        at edu.uci.ics.hyracks.algebricks.core.jobgen.impl.PlanCompiler.compileOpRef(PlanCompiler.java:94)
        at edu.uci.ics.hyracks.algebricks.core.jobgen.impl.PlanCompiler.compileOpRef(PlanCompiler.java:81)
        at edu.uci.ics.hyracks.algebricks.core.jobgen.impl.PlanCompiler.compileOpRef(PlanCompiler.java:81)
        at edu.uci.ics.hyracks.algebricks.core.jobgen.impl.PlanCompiler.compileOpRef(PlanCompiler.java:81)
        at edu.uci.ics.hyracks.algebricks.core.jobgen.impl.PlanCompiler.compileOpRef(PlanCompiler.java:81)
        at edu.uci.ics.hyracks.algebricks.core.jobgen.impl.PlanCompiler.compilePlan(PlanCompiler.java:57)
        at edu.uci.ics.hyracks.algebricks.compiler.api.HeuristicCompilerFactoryBuilder$1$1.createJob(HeuristicCompilerFactoryBuilder.java:100)
        at edu.uci.ics.asterix.api.common.APIFramework.compileQuery(APIFramework.java:376)
        at edu.uci.ics.asterix.aql.translator.AqlTranslator.rewriteCompileQuery(AqlTranslator.java:1722)
        at edu.uci.ics.asterix.aql.translator.AqlTranslator.handleQuery(AqlTranslator.java:2034)
        at edu.uci.ics.asterix.aql.translator.AqlTranslator.compileAndExecute(AqlTranslator.java:315)
        at edu.uci.ics.asterix.api.http.servlet.APIServlet.doPost(APIServlet.java:97)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:754)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:546)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:483)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:970)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:411)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:904)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
        at org.eclipse.jetty.server.Server.handle(Server.java:347)
        at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:439)
        at org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:924)
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:781)
        at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:220)
        at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:43)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:545)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:43)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at edu.uci.ics.asterix.metadata.declared.AqlMetadataProvider.splitsForDataset(AqlMetadataProvider.java:2058)
        at edu.uci.ics.asterix.metadata.declared.AqlMetadataProvider.splitProviderAndPartitionConstraintsForDataset(AqlMetadataProvider.java:1986)
        at edu.uci.ics.asterix.metadata.declared.AqlMetadataProvider.buildBtreeRuntime(AqlMetadataProvider.java:663)
        ... 37 more
Nov 25, 2014 6:20:39 PM edu.uci.ics.asterix.api.http.servlet.APIServlet doPost
SEVERE: java.lang.NullPointerException
edu.uci.ics.hyracks.algebricks.common.exceptions.AlgebricksException: 
java.lang.NullPointerException
        at edu.uci.ics.asterix.metadata.declared.AqlMetadataProvider.buildBtreeRuntime(AqlMetadataProvider.java:666)
        at edu.uci.ics.asterix.metadata.declared.AqlMetadataProvider.buildInternalDatasetScan(AqlMetadataProvider.java:415)
        at edu.uci.ics.asterix.metadata.declared.AqlMetadataProvider.getScannerRuntime(AqlMetadataProvider.java:307)
        at edu.uci.ics.hyracks.algebricks.core.algebra.operators.physical.DataSourceScanPOperator.contributeRuntimeOperator(DataSourceScanPOperator.java:83)
        at edu.uci.ics.hyracks.algebricks.core.algebra.operators.logical.AbstractLogicalOperator.contributeRuntimeOperator(AbstractLogicalOperator.java:158)
        at edu.uci.ics.hyracks.algebricks.core.jobgen.impl.PlanCompiler.compileOpRef(PlanCompiler.java:94)
        at edu.uci.ics.hyracks.algebricks.core.jobgen.impl.PlanCompiler.compileOpRef(PlanCompiler.java:81)
        at edu.uci.ics.hyracks.algebricks.core.jobgen.impl.PlanCompiler.compileOpRef(PlanCompiler.java:81)
        at edu.uci.ics.hyracks.algebricks.core.jobgen.impl.PlanCompiler.compileOpRef(PlanCompiler.java:81)
        at edu.uci.ics.hyracks.algebricks.core.jobgen.impl.PlanCompiler.compileOpRef(PlanCompiler.java:81)
        at edu.uci.ics.hyracks.algebricks.core.jobgen.impl.PlanCompiler.compilePlan(PlanCompiler.java:57)
        at edu.uci.ics.hyracks.algebricks.compiler.api.HeuristicCompilerFactoryBuilder$1$1.createJob(HeuristicCompilerFactoryBuilder.java:100)
        at edu.uci.ics.asterix.api.common.APIFramework.compileQuery(APIFramework.java:376)
        at edu.uci.ics.asterix.aql.translator.AqlTranslator.rewriteCompileQuery(AqlTranslator.java:1722)
        at edu.uci.ics.asterix.aql.translator.AqlTranslator.handleQuery(AqlTranslator.java:2034)
        at edu.uci.ics.asterix.aql.translator.AqlTranslator.compileAndExecute(AqlTranslator.java:315)
        at edu.uci.ics.asterix.api.http.servlet.APIServlet.doPost(APIServlet.java:97)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:754)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
        at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:546)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:483)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:970)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:411)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:904)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
        at org.eclipse.jetty.server.Server.handle(Server.java:347)
        at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:439)
        at org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:924)
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:781)
        at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:220)
        at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:43)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:545)
        at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:43)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.NullPointerException
        at edu.uci.ics.asterix.metadata.declared.AqlMetadataProvider.splitsForDataset(AqlMetadataProvider.java:2058)
        at edu.uci.ics.asterix.metadata.declared.AqlMetadataProvider.splitProviderAndPartitionConstraintsForDataset(AqlMetadataProvider.java:1986)
        at edu.uci.ics.asterix.metadata.declared.AqlMetadataProvider.buildBtreeRuntime(AqlMetadataProvider.java:663)
        ... 37 more
...
Nov 25, 2014 6:21:13 PM 
edu.uci.ics.hyracks.control.common.dataset.ResultStateSweeper sweep
INFO: Result state cleanup instance successfully completed.

I see that the Metadata datasets are created in this directory
/home/khurram/asterix_storage/storage/Metadata

khurram@khurram:~/asterix_storage/storage/Metadata$ ls
CompactionPolicy_idx_CompactionPolicy  Dataset_idx_GroupName                    
Datatype_idx_DatatypeName      FeedActivity_idx_FeedActivity  
Function_idx_Function  Nodegroup_idx_Nodegroup
Dataset_idx_Dataset                    DatasourceAdapter_idx_DatasourceAdapter  
Dataverse_idx_Dataverse        Feed_idx_Feed                  Index_idx_Index   
     Node_idx_Node
Dataset_idx_DatatypeName               Datatype_idx_Datatype                    
ExternalFile_idx_ExternalFile  FeedPolicy_idx_FeedPolicy      
Library_idx_Library

This other directory was empty though
/home/khurram/asterix_storage/asterix_root_metadata/inst1_node1_iodevice0

khurram@khurram:~/uci/asterixdb$ git log
commit 2fd7fa66487acb4d6e13b755839a5d8d687777e7
Author: buyingyi <buyingyi@gmail.com>
Date:   Mon Nov 24 19:31:55 2014 -0800

    Add NoSQL grammar:
    Please make FROM a synonym for FOR, SELECT a synonym for RETURN, and WITH a synonym for LET.  No semantic changes here - just some keyword synonyms.

    Change-Id: Iffba1c25c611fc420b6e223bcdde75a9244035e4
    Reviewed-on: http://fulliautomatix.ics.uci.edu:8443/181
    Tested-by: Jenkins <jenkins@fulliautomatix.ics.uci.edu>
    Reviewed-by: Till Westmann <westmann@gmail.com>

Also,
1+1
I see it works (returns 2) as the result on the Web UI.

khurram@khurram:~/uci/asterixdb$ 
./asterix-installer/target/asterix-installer-0.8.7-SNAPSHOT-binary-assembly/bin/
managix describe
INFO: Name:inst1
Created:Tue Nov 25 18:18:58 PST 2014
Web-Url:http://127.0.0.1:19001
State:ACTIVE

khurram@khurram:~$ mvn -version
Apache Maven 3.0.5
Maven home: /usr/share/maven
Java version: 1.7.0_65, vendor: Oracle Corporation
Java home: /usr/lib/jvm/java-7-openjdk-amd64/jre
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "3.16.0-18-generic", arch: "amd64", family: "unix"
khurram@khurram:~$ java -version
java version "1.7.0_65"
OpenJDK Runtime Environment (IcedTea 2.5.2) (7u65-2.5.2-4)
OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)
khurram@khurram:~$ uname -a
Linux khurram 3.16.0-18-generic #25-Ubuntu SMP Fri Sep 26 02:44:15 UTC 2014 
x86_64 x86_64 x86_64 GNU/Linux

CC and NC logs are attached as a ZIP with this issue

Thanks,
Khurram

Original issue reported on code.google.com by khfaraaz82 on 26 Nov 2014 at 3:39

Attachments:

GoogleCodeExporter commented 9 years ago
Some interesting observations from the log:
a) CC.log: "Address already in use"
    Indicates a port being occupied. 
b) CC.log: Starts with the exception listed in (a). Starting message:"INFO: 
Starting ClusterControllerService:"
   not found
c) NC.log: java.lang.Exception: Node with this name already registered.
    Indicates a previous successful attempt of registering the node. 

Khurram: a few questions:
(1) Is this reproducible (I assume this is sporadic)?
(2) When creating a new instance, did we already have another MANAGIX_HOME set 
on your machine that was used to launch an asterix instance before and it being 
not shut down properly? Looking for information that helps explain "Address 
already in use" 

Original comment by RamanGro...@gmail.com on 26 Nov 2014 at 4:35

GoogleCodeExporter commented 9 years ago
If you take a closer look at the steps I listed to repro the problem, you will 
see there are two attempts where I try to create an instance with the same 
name, managix create -n inst1, the instance creation in my first attempt did 
not succeed, and my second attempt to create the instance with same name as 
before succeeded. Note that I do configure and validate instance b/n the two 
create instance commands.

Here is the output from jps, which tells us there is only one CC and one NC 
running. I did not kill any other CC/NC process before I tried to create the 
instance.

khurram@khurram:~$ jps
3792 CCDriver
8147 Jps
3156 QuorumPeerMain
3908 NCDriver

To answer your question about it being reproducible, I am trying the same steps 
now, we will know if the same problem shows up.

Original comment by khfaraaz82 on 26 Nov 2014 at 7:46

GoogleCodeExporter commented 9 years ago
Khurram, I did notice that you issued two create instance commands. 
However you did not share the output from the first attempt (that is the 
attempt prior to you configuring Managix) . Please share with us the output. 

"Here is the output from jps, which tells us there is only one CC and one NC 
running. I did not kill any other CC/NC process before I tried to create the 
instance."

Did you verify if there were any processes lurking around prior to your second 
attempt? 

"To answer your question about it being reproducible, I am trying the same 
steps now, we will know if the same problem shows up."
By same steps, do you mean making two attempts to create an instance?

Original comment by RamanGro...@gmail.com on 26 Nov 2014 at 7:51

GoogleCodeExporter commented 9 years ago
I did stop instance and delete instance, and tried to recreate the instance 
with the same name as before. After stop+delete, I see the correct message, 
Asterix instance by name inst1 already exists. So what could have caused the 
NPE that we saw in my first attempt ? (Like you said this appears to be 
sporadic, but is there a way to handle this scenario, so we avoid seeing the 
NPE ?)

khurram@khurram:~/uci/asterixdb$  export 
MANAGIX_HOME="/home/khurram/uci/asterixdb/asterix-installer/target/asterix-insta
ller-0.8.7-SNAPSHOT-binary-assembly"
khurram@khurram:~/uci/asterixdb$  export PATH=$PATH:$MANAGIX_HOME/bin
khurram@khurram:~/uci/asterixdb$  export 
JAVA_HOME="/usr/lib/jvm/java-7-openjdk-amd64"
khurram@khurram:~/uci/asterixdb$  export PATH=$JAVA_HOME/bin:$PATH
khurram@khurram:~/uci/asterixdb$  find . -name "local.xml"
./asterix-installer/src/main/resources/clusters/local/local.xml
./asterix-installer/target/classes/clusters/local/local.xml
./asterix-installer/target/asterix-installer-0.8.7-SNAPSHOT-binary-assembly/clus
ters/local/local.xml
khurram@khurram:~/uci/asterixdb$  
./asterix-installer/target/asterix-installer-0.8.7-SNAPSHOT-binary-assembly/bin/
managix create -n inst1 -c 
./asterix-installer/target/asterix-installer-0.8.7-SNAPSHOT-binary-assembly/clus
ters/local/local.xml

INFO: Name:inst1
Created:Tue Nov 25 23:55:48 PST 2014
Web-Url:http://127.0.0.1:19001
State:ACTIVE

khurram@khurram:~/uci/asterixdb$  
./asterix-installer/target/asterix-installer-0.8.7-SNAPSHOT-binary-assembly/bin/
managix configure
khurram@khurram:~/uci/asterixdb$  
./asterix-installer/target/asterix-installer-0.8.7-SNAPSHOT-binary-assembly/bin/
managix validate
INFO: Environment [OK]
INFO: Managix Configuration [OK]
khurram@khurram:~/uci/asterixdb$  
./asterix-installer/target/asterix-installer-0.8.7-SNAPSHOT-binary-assembly/bin/
managix create -n inst1 -c 
./asterix-installer/target/asterix-installer-0.8.7-SNAPSHOT-binary-assembly/clus
ters/local/local.xml
ERROR: Asterix instance by name inst1 already exists.

Original comment by khfaraaz82 on 26 Nov 2014 at 8:00

GoogleCodeExporter commented 9 years ago
In the case when you faced NPE, you had two attempts at creating the instance.

Attempt 1: managix create ....  
                 (Since the output you shared did not contain any response to this command, I assume that this command was killed by a CTRL-C: please confirm) 

Next you configured Managix and made another attempt

Attempt2: managix create ....  
                This time you get an ACTIVE instance but Metadata query fails

Please confirm if I am interpreting  your actions (from the logs) correctly.

Original comment by RamanGro...@gmail.com on 26 Nov 2014 at 8:14

GoogleCodeExporter commented 9 years ago
Yes, in my first attempt to create instance I remember having used CTRL-C 
(because I thought I was running the wrong command, which wasn't the case). 
Your assumption is right.

Original comment by khfaraaz82 on 26 Nov 2014 at 8:18

GoogleCodeExporter commented 9 years ago
Thanks for confirming that my interpretation was correct.

Here is what exactly happened... ( I was able to deterministically reproduce 
the scenario in consectuvie attempts that confirmed the hypothesis)

Step 1: Submit create command:
                managix create -n <name of instance> 
                time ticking....

Step 2: CTRL-C 
           Abruptly end Managix create command, while it is   midway launching the processes (CC and NC) 
          At the time of hitting CTRL-C, the processes are already been launched, are occupying ports and are writing logs. 
          NOTE: Before the abrupt termination, the processes were up but as Managix was terminated, it could not be told finish the execution of the create statement and record with itself the fact that the instance has been created and is up and running. 
        Final State: Processes are running; Instance is up! but Managix doesn't know as it was killed (CTRL-C) abruptly. 

Step 3:   Submit another create command.
            Managix checks if the instance already exists. An uninterrupted create command puts all information in Managix's Zookeeper but since create command was itself killed abruptly, this critical step (of putting in information) could not be complete. So Managix goea ahead and attempts to create another instance. 

Now in the second attempt, CC is launched (again) and finds the port in use 
(from the previous attempt) and dies.  NC starts up and tries to register with 
CC and complains - "already registered" (remember the previous NC did that too) 
and this shows up the exception in NC log. Note that the second NC is writing 
to the same log file as used by the previous NC. 

I was able to reproduce the scenario.
Here are the exceptions... (which are exactly the same as reported in the 
original description of the issue)

INFO: Registered Runtime Functions
Nov 26, 2014 1:59:05 PM edu.uci.ics.hyracks.control.cc.ClusterControllerService 
start
INFO: Started ClusterControllerService
java.net.BindException: Address already in use <========= Look here!
    at sun.nio.ch.Net.bind0(Native Method)
    at sun.nio.ch.Net.bind(Net.java:344)
    at sun.nio.ch.Net.bind(Net.java:336)
    at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:199)
    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
    at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)
    at edu.uci.ics.hyracks.ipc.impl.IPCConnectionManager.<init>(IPCConnectionManager.java:67)
    at edu.uci.ics.hyracks.ipc.impl.IPCSystem.<init>(IPCSystem.java:40)

=======

Nov 26, 2014 2:04:40 PM 
edu.uci.ics.asterix.transaction.management.resource.PersistentLocalResourceRepos
itory initialize
INFO: The resource id factory is intialized with the value: 16
Nov 26, 2014 2:04:40 PM 
edu.uci.ics.asterix.transaction.management.resource.PersistentLocalResourceRepos
itory initialize
INFO: Completed the initialization of the local resource repository
java.lang.Exception: Node with this name already registered.  <==========Look 
Here
    at edu.uci.ics.hyracks.control.cc.work.RegisterNodeWork.doRun(RegisterNodeWork.java:58)
    at edu.uci.ics.hyracks.control.common.work.SynchronizableWork.run(SynchronizableWork.java:32)
    at edu.uci.ics.hyracks.control.common.work.WorkQueue$WorkerThread.run(WorkQueue.java:122)
REMOVED transaction context for JOB ID JID:3 at time 1416990888033

ALL THIS IS FINE, but WHY NPE?

To understand the reason for NPE, let us go back to the exception:

INFO: Completed the initialization of the local resource repository
java.lang.Exception: Node with this name already registered.

When the NC from the second attempt correctly faces the above exception (node 
already registered), it still goes ahead and incorrectly notifies its presence 
to the CC and provides its configuration. 
Look inside RegisterNodeWork.java Method signature: protected void doRun() 
throws Exception 
The pseduocode there is shown below. 
try{
  // try to register the NC  
  //an exception is thrown here
  ...
  ...
  initialize NC configuration
} catch {
   // take note of the exception
}
  ccs.getApplicationContext().notifyNodeJoin(id, ncConfiguration);

In above line of code, the parameter ncConfiguration is NULL as it could not be 
set because of the exception. 

Now CC is told of a node join attempt and it overrides the NC configuration 
with a NULL. 
When the end-user submits a query, the configuration parameters for Metadata 
Node (the only node in this case) is returned as NULL and boom! 

FIX: 
RegisterNodeWork should not notify CC of a node join if the attempt has failed. 

But there is a deeper issue here.
What if the user ha used a different set of asterix-configuration,xml in the 
second attempt. The asterix instance that is finally obtained is from the first 
attempt which has the old configuration. Should we handle all scenarios where 
an end-user  hits CTRL-C and kills Managix abruptly?  

Original comment by RamanGro...@gmail.com on 26 Nov 2014 at 9:44

GoogleCodeExporter commented 9 years ago
The answer to your question, "Should we handle all scenarios where an end-user 
hits CTRL-C and kills Managix abruptly? ", is Yes we should.

Anytime the CC/NC's go down due to what ever reasons (which could be, due to 
power failure, CTRL-C from user or any other reasons), such scenarios should be 
handled.

Do we have any crash recovery tests ? Where we bring down the CC/NC and then 
get them back UP again and try to query Metadata datasets or any other internal 
datasets, after the system has recovered from a crash ? 

Original comment by khfaraaz82 on 27 Nov 2014 at 1:50

GoogleCodeExporter commented 9 years ago
"Anytime the CC/NC's go down due to what ever reasons (which could be, due to 
power failure, CTRL-C from user or any other reasons), such scenarios should be 
handled."

I am concerned the above case if very "different" from what is being discussed 
in this issue. So let us not mix things. 

NC going down abruptly (CTRL-C or power failure)  is handled as part of our 
recovery mechanism.
CC is the single point of failure in AsterixDB. It going down is not handled 
currently. 
Note that above have a longer lifetime, are prone to failures and thus need to 
be protected. So let us not confuse CC/NC recovery with this issue. 

Managix being killed abruptly: 
There are two cases:
(i) Power Failure: In this case issue 830 does not arise as the power failure 
would ofcourse kill all processes and will not leave any live processes when a 
second attempt (after power is restored) is made to create the instance. 

(ii) CTRL-C: 
A CTRL-C after launching of create statement and at a time when both processes 
have launched (CC and NC) causes this issue. This is being fixed. 
What I meant to ask was very different and I am afraid has not been interpreted 
correctly. Let me rephrase...

Should we have crash/recovery support for Managix commands? 

Original comment by RamanGro...@gmail.com on 27 Nov 2014 at 3:05

GoogleCodeExporter commented 9 years ago
Yes, crash/recovery support for managix commands would help. That support will 
help crash/recovery tests.

Original comment by khfaraaz82 on 28 Nov 2014 at 7:42

GoogleCodeExporter commented 9 years ago
Raman, do you have the branch name where this was being fixed, let me know and 
I can verify. Thanks.

Original comment by khfaraaz82 on 11 Dec 2014 at 3:50