lwhay / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

counthashed-ngram-tokens() function throws exception #789

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
The following query works:

set import-private-functions 'true';
counthashed-gram-tokens("Winner-Take-All Network Utilising Pseudoinverse 
Reconstruction Subnets Demonstrates Robustness on the Handprinted Character 
Rec", 3, false)

But the following does not:
set import-private-functions 'true';
counthashed-gram-tokens("Winner-Take-All Network Utilising Pseudoinverse 
Reconstruction Subnets Demonstrates Robustness on the Handprinted Character 
Reco", 3, false)

What is the expected output? What do you see instead?
It should return all the counthashed gram tokens, but it throws an exception 
with the following stack trace:

java.lang.IllegalArgumentException
    at edu.uci.ics.hyracks.data.std.primitive.UTF8StringPointable.charAt(UTF8StringPointable.java:86)
    at edu.uci.ics.hyracks.storage.am.lsm.invertedindex.tokenizers.NGramUTF8StringBinaryTokenizer.next(NGramUTF8StringBinaryTokenizer.java:73)
    at edu.uci.ics.asterix.runtime.evaluators.common.GramTokensEvaluator.evaluate(GramTokensEvaluator.java:78)
    at edu.uci.ics.hyracks.algebricks.core.algebra.expressions.LogicalExpressionJobGenToExpressionRuntimeProviderAdapter$ScalarEvaluatorFactoryAdapter$1.evaluate(LogicalExpressionJobGenToExpressionRuntimeProviderAdapter.java:106)
    at edu.uci.ics.asterix.optimizer.rules.ConstantFoldingRule$ConstantFoldingVisitor.visitScalarFunctionCallExpression(ConstantFoldingRule.java:206)
    at edu.uci.ics.asterix.optimizer.rules.ConstantFoldingRule$ConstantFoldingVisitor.visitScalarFunctionCallExpression(ConstantFoldingRule.java:1)
    at edu.uci.ics.hyracks.algebricks.core.algebra.expressions.ScalarFunctionCallExpression.accept(ScalarFunctionCallExpression.java:51)
    at edu.uci.ics.asterix.optimizer.rules.ConstantFoldingRule$ConstantFoldingVisitor.transform(ConstantFoldingRule.java:145)
    at edu.uci.ics.hyracks.algebricks.core.algebra.operators.logical.AbstractAssignOperator.acceptExpressionTransform(AbstractAssignOperator.java:63)
    at edu.uci.ics.asterix.optimizer.rules.ConstantFoldingRule.rewritePost(ConstantFoldingRule.java:132)
    at edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewriteOperatorRef(AbstractRuleController.java:122)
    at edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewriteOperatorRef(AbstractRuleController.java:96)
    at edu.uci.ics.hyracks.algebricks.core.rewriter.base.AbstractRuleController.rewriteOperatorRef(AbstractRuleController.java:96)
    at edu.uci.ics.hyracks.algebricks.compiler.rewriter.rulecontrollers.SequentialFixpointRuleController.rewriteWithRuleCollection(SequentialFixpointRuleController.java:49)
    at edu.uci.ics.hyracks.algebricks.core.rewriter.base.HeuristicOptimizer.runOptimizationSets(HeuristicOptimizer.java:91)
    at edu.uci.ics.hyracks.algebricks.core.rewriter.base.HeuristicOptimizer.optimize(HeuristicOptimizer.java:78)
    at edu.uci.ics.hyracks.algebricks.compiler.api.HeuristicCompilerFactoryBuilder$1$1.optimize(HeuristicCompilerFactoryBuilder.java:83)
    at edu.uci.ics.asterix.api.common.APIFramework.compileQuery(APIFramework.java:287)
    at edu.uci.ics.asterix.aql.translator.AqlTranslator.rewriteCompileQuery(AqlTranslator.java:1688)
    at edu.uci.ics.asterix.aql.translator.AqlTranslator.handleQuery(AqlTranslator.java:1996)
    at edu.uci.ics.asterix.aql.translator.AqlTranslator.compileAndExecute(AqlTranslator.java:311)
    at edu.uci.ics.asterix.api.http.servlet.APIServlet.doPost(APIServlet.java:97)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:727)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:820)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:546)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:483)
    at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:228)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:956)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:411)
    at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:188)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:891)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:114)
    at org.eclipse.jetty.server.Server.handle(Server.java:353)
    at org.eclipse.jetty.server.HttpConnection.handleRequest(HttpConnection.java:598)
    at org.eclipse.jetty.server.HttpConnection$RequestHandler.content(HttpConnection.java:1076)
    at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:805)
    at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:218)
    at org.eclipse.jetty.server.HttpConnection.handle(HttpConnection.java:427)
    at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:510)
    at org.eclipse.jetty.io.nio.SelectChannelEndPoint.access$000(SelectChannelEndPoint.java:34)
    at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:40)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:450)
    at java.lang.Thread.run(Thread.java:722)

Please use labels and text to provide additional information.

Original issue reported on code.google.com by icetin...@gmail.com on 3 Jul 2014 at 5:46

GoogleCodeExporter commented 9 years ago
This tokenizer was not skipping the length & type bytes in the byte array, and 
was treating them as characters. Hence it was throwing an 
IllegalArgumentException when the length of the string becomes 128. This issue 
is fixed with the following change in Hyracks: 
https://code.google.com/p/hyracks/source/detail?r=63e354ac91c94f225d1d3b0ff6a225
ce37224c71

Original comment by icetin...@gmail.com on 11 Jul 2014 at 9:20