namhnguyen / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

insert operator at scale takes forever #825

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Reproduction:
=============
Try the following query at scale:
use dataverse kereno;

//Query 1
//========

insert into dataset by_actions (
for $t in dataset page_views
for $x in $t.page_links
where ($t.action=1)
    return {
           "user": $t.user,
           "header": string($t.page_info.a)
}
);

insert into dataset by_actions(
for $t in dataset page_views
for $x in $t.page_links
    where ($t.action != 1)
     return {
             "user":$t.user,
             "header":string($x.b)
}
);

for $t in dataset by_actions
group by $group := $t.user with $t
return {
"name": $group,
"count": count($t)
}

Expected:
=========
Complete in a timely manner (less than an hour).
Instead the query ran for a few days and still didn't complete.

Observations:
=============
These are the dataset files with their time and size. As you can see, the files 
grow every hour or every few hours.
1018M   2014-11-18 20:59    2014-11-18-20-57-24-012_2014-11-18-20-33-39-978_b
12M 2014-11-18 20:59    2014-11-18-20-57-24-012_2014-11-18-20-33-39-978_f
1018M   2014-11-18 21:42    2014-11-18-21-40-17-383_2014-11-18-20-57-46-742_b
12M 2014-11-18 21:41    2014-11-18-21-40-17-383_2014-11-18-20-57-46-742_f
1014M   2014-11-18 22:56    2014-11-18-22-55-43-484_2014-11-18-21-41-17-827_b
12M 2014-11-18 22:56    2014-11-18-22-55-43-484_2014-11-18-21-41-17-827_f
1014M   2014-11-19 00:49    2014-11-19-00-47-58-013_2014-11-18-22-56-53-510_b
12M 2014-11-19 00:48    2014-11-19-00-47-58-013_2014-11-18-22-56-53-510_f
1015M   2014-11-19 03:30    2014-11-19-03-29-09-454_2014-11-19-00-49-39-134_b
12M 2014-11-19 03:30    2014-11-19-03-29-09-454_2014-11-19-00-49-39-134_f
1013M   2014-11-19 09:01    2014-11-19-08-59-48-358_2014-11-19-03-33-00-056_b
12M 2014-11-19 09:00    2014-11-19-08-59-48-358_2014-11-19-03-33-00-056_f
157M    2014-11-19 11:57    2014-11-19-11-57-45-381_2014-11-19-09-11-01-704_b
1.9M    2014-11-19 11:57    2014-11-19-11-57-45-381_2014-11-19-09-11-01-704_f

I attached the jstack on the cc and one of the NCs. It shows that the 
OperatorDescriptor is waiting. And the DDL and logs as well.

Original issue reported on code.google.com by ker...@gmail.com on 19 Nov 2014 at 8:31

Attachments:

GoogleCodeExporter commented 9 years ago
What is the record length (size of each tuple being inserted) ? 
And how many such tuples are you trying to insert ?

I looked into the cc.log, however I did not find a match for HASH/HYBRID in the 
plan, we must see that, since your query is doing a equi-join (which is hybrid 
hash join).

I also see there a NPE in the cc.log

Nov 18, 2014 8:33:07 PM 
edu.uci.ics.hyracks.control.common.dataset.ResultStateSweeper sweep
INFO: Result state cleanup instance successfully completed.
java.lang.NullPointerException
at 
edu.uci.ics.asterix.om.typecomputer.impl.UnaryBooleanOrNullFunctionTypeComputer.
computeType(UnaryBooleanOrNullFunctionTypeComputer.java:50)
at 
edu.uci.ics.asterix.dataflow.data.common.AqlExpressionTypeComputer.getTypeForFun
ction(AqlExpressionTypeComputer.java:88)
at 
edu.uci.ics.asterix.dataflow.data.common.AqlExpressionTypeComputer.getType(AqlEx
pressionTypeComputer.java:56)
at 
edu.uci.ics.hyracks.algebricks.core.algebra.typing.AbstractTypeEnvironment.getTy
pe(AbstractTypeEnvironment.java:41)
at 
edu.uci.ics.asterix.optimizer.rules.CheckFilterExpressionTypeRule.rewritePost(Ch
eckFilterExpressionTypeRule.java:58)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewrite
OperatorRef(AbstractRuleController.java:122)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewrite
OperatorRef(AbstractRuleController.java:108)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewrite
OperatorRef(AbstractRuleController.java:96)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewrite
OperatorRef(AbstractRuleController.java:108)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewrite
OperatorRef(AbstractRuleController.java:96)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewrite
OperatorRef(AbstractRuleController.java:96)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewrite
OperatorRef(AbstractRuleController.java:96)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewrite
OperatorRef(AbstractRuleController.java:96)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewrite
OperatorRef(AbstractRuleController.java:96)
at 
edu.uci.ics.hyracks.algebricks.compiler.rewriter.rulecontrollers.SequentialOnceR
uleController.rewriteWithRuleCollection(SequentialOnceRuleController.java:40)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.HeuristicOptimizer.runOptimiza
tionSets(HeuristicOptimizer.java:91)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.HeuristicOptimizer.optimize(He
uristicOptimizer.java:78)
at 
edu.uci.ics.hyracks.algebricks.compiler.api.HeuristicCompilerFactoryBuilder$1$1.
optimize(HeuristicCompilerFactoryBuilder.java:83)
at 
edu.uci.ics.asterix.api.common.APIFramework.compileQuery(APIFramework.java:308)
at 
edu.uci.ics.asterix.aql.translator.AqlTranslator.rewriteCompileQuery(AqlTranslat
or.java:1716)
at 
edu.uci.ics.asterix.aql.translator.AqlTranslator.handleQuery(AqlTranslator.java:
2028)
at 
edu.uci.ics.asterix.aql.translator.AqlTranslator.compileAndExecute(AqlTranslator
.java:316)
at edu.uci.ics.asterix.api.http.servlet.APIServlet.doPost(APIServlet.java:97)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:754)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:546)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:483)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231
)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:970
)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:411)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:904)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
at org.eclipse.jetty.server.Server.handle(Server.java:347)
at 
org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:439)
at 
org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.ja
va:924)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:781)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:220)
at 
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:43)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java
:545)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:
43)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529)
at java.lang.Thread.run(Thread.java:745)
Nov 18, 2014 8:33:11 PM edu.uci.ics.asterix.api.http.servlet.APIServlet doPost
SEVERE: null
java.lang.NullPointerException
at 
edu.uci.ics.asterix.om.typecomputer.impl.UnaryBooleanOrNullFunctionTypeComputer.
computeType(UnaryBooleanOrNullFunctionTypeComputer.java:50)
at 
edu.uci.ics.asterix.dataflow.data.common.AqlExpressionTypeComputer.getTypeForFun
ction(AqlExpressionTypeComputer.java:88)
at 
edu.uci.ics.asterix.dataflow.data.common.AqlExpressionTypeComputer.getType(AqlEx
pressionTypeComputer.java:56)
at 
edu.uci.ics.hyracks.algebricks.core.algebra.typing.AbstractTypeEnvironment.getTy
pe(AbstractTypeEnvironment.java:41)
at 
edu.uci.ics.asterix.optimizer.rules.CheckFilterExpressionTypeRule.rewritePost(Ch
eckFilterExpressionTypeRule.java:58)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewrite
OperatorRef(AbstractRuleController.java:122)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewrite
OperatorRef(AbstractRuleController.java:108)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewrite
OperatorRef(AbstractRuleController.java:96)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewrite
OperatorRef(AbstractRuleController.java:108)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewrite
OperatorRef(AbstractRuleController.java:96)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewrite
OperatorRef(AbstractRuleController.java:96)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewrite
OperatorRef(AbstractRuleController.java:96)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewrite
OperatorRef(AbstractRuleController.java:96)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewrite
OperatorRef(AbstractRuleController.java:96)
at 
edu.uci.ics.hyracks.algebricks.compiler.rewriter.rulecontrollers.SequentialOnceR
uleController.rewriteWithRuleCollection(SequentialOnceRuleController.java:40)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.HeuristicOptimizer.runOptimiza
tionSets(HeuristicOptimizer.java:91)
at 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.HeuristicOptimizer.optimize(He
uristicOptimizer.java:78)
at 
edu.uci.ics.hyracks.algebricks.compiler.api.HeuristicCompilerFactoryBuilder$1$1.
optimize(HeuristicCompilerFactoryBuilder.java:83)
at 
edu.uci.ics.asterix.api.common.APIFramework.compileQuery(APIFramework.java:308)
at 
edu.uci.ics.asterix.aql.translator.AqlTranslator.rewriteCompileQuery(AqlTranslat
or.java:1716)
at 
edu.uci.ics.asterix.aql.translator.AqlTranslator.handleQuery(AqlTranslator.java:
2028)
at 
edu.uci.ics.asterix.aql.translator.AqlTranslator.compileAndExecute(AqlTranslator
.java:316)
at edu.uci.ics.asterix.api.http.servlet.APIServlet.doPost(APIServlet.java:97)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:754)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:546)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:483)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231
)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:970
)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:411)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:192)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:904)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:110)
at org.eclipse.jetty.server.Server.handle(Server.java:347)
at 
org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:439)
at 
org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.ja
va:924)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:781)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:220)
at 
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:43)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java
:545)
at 
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:
43)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:529)
at java.lang.Thread.run(Thread.java:745)

Original comment by khfaraaz82 on 20 Nov 2014 at 6:42

GoogleCodeExporter commented 9 years ago
That's a whole series of statements.
Please dig deeper and see which one is eating the time?

Original comment by dtab...@gmail.com on 20 Nov 2014 at 6:45

GoogleCodeExporter commented 9 years ago
P.S : I see it is not an equi-join. I saw the equality sign and assumed too 
soon that it was an equi-join.

Original comment by khfaraaz82 on 20 Nov 2014 at 6:46

GoogleCodeExporter commented 9 years ago
Here is the optimized plan from cc.log for the insert statement in question.

Nov 18, 2014 8:33:29 PM 
edu.uci.ics.hyracks.algebricks.core.rewriter.base.HeuristicOptimizer optimize
INFO: Optimized Plan:
commit
-- COMMIT |PARTITIONED|
project ([$$12])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
insert into kereno:by_actions from %0->$$23 partitioned by [%0->$$12]
-- INSERT_DELETE |PARTITIONED|
exchange
-- HASH_PARTITION_EXCHANGE [$$12] |PARTITIONED|
assign [$$12] <- [function-call: asterix:field-access-by-index, Args:[%0->$$23, 
AInt32: {0}]]
-- ASSIGN |PARTITIONED|
project ([$$23])
-- STREAM_PROJECT |PARTITIONED|
assign [$$23] <- [function-call: asterix:cast-record, Args:[%0->$$13]]
-- ASSIGN |PARTITIONED|
project ([$$13])
-- STREAM_PROJECT |PARTITIONED|
assign [$$13] <- [function-call: asterix:not-null, Args:[function-call: 
asterix:record-merge, Args:[%0->$$6, function-call: 
asterix:closed-record-constructor, Args:[AString: {id}, function-call: 
asterix:create-uuid, Args:[]]]]]
-- ASSIGN |PARTITIONED|
project ([$$6])
-- STREAM_PROJECT |PARTITIONED|
assign [$$6] <- [function-call: asterix:closed-record-constructor, 
Args:[AString: {user}, function-call: asterix:field-access-by-index, 
Args:[%0->$$0, AInt32: {1}], AString: {header}, function-call: asterix:string, 
Args:[function-call: asterix:field-access-by-name, Args:[%0->$$20, AString: 
{a}]]]]
-- ASSIGN |PARTITIONED|
project ([$$0, $$20])
-- STREAM_PROJECT |PARTITIONED|
unnest $$1 <- function-call: asterix:scan-collection, Args:[%0->$$17]
-- UNNEST |PARTITIONED|
select (function-call: algebricks:eq, Args:[function-call: 
asterix:field-access-by-index, Args:[%0->$$0, AInt32: {2}], AInt32: {1}])
-- STREAM_SELECT |PARTITIONED|
assign [$$17, $$20] <- [function-call: asterix:field-access-by-index, 
Args:[%0->$$0, AInt32: {9}], function-call: asterix:field-access-by-index, 
Args:[%0->$$0, AInt32: {8}]]
-- ASSIGN |PARTITIONED|
project ([$$0])
-- STREAM_PROJECT |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
data-scan []<-[$$15, $$0] <- kereno:page_views
-- DATASOURCE_SCAN |PARTITIONED|
exchange
-- ONE_TO_ONE_EXCHANGE |PARTITIONED|
empty-tuple-source
-- EMPTY_TUPLE_SOURCE |PARTITIONED|

Original comment by khfaraaz82 on 20 Nov 2014 at 7:10

GoogleCodeExporter commented 9 years ago
There is information only for the first INSERT statement in the cc.log and I 
have pasted the optimized plan above (in previous comment) for the below 
INSERT. I believe this is taking longer to complete.

insert into dataset by_actions (
for $t in dataset page_views
for $x in $t.page_links
where ($t.action=1)
    return {
           "user": $t.user,
           "header": string($t.page_info.a)
}
);

Original comment by khfaraaz82 on 20 Nov 2014 at 7:21

GoogleCodeExporter commented 9 years ago
Query 4 (distinct + group-by) using inserts is also not making progress (cpu 
idle and NC is waiting)

Original comment by ker...@gmail.com on 20 Nov 2014 at 7:21