prestodb / presto

The official home of the Presto distributed SQL query engine for big data
http://prestodb.io
Apache License 2.0
15.96k stars 5.34k forks source link

Presto 0.167 "java.lang.NullPointerException: maxSize is null" #7620

Closed bond- closed 5 years ago

bond- commented 7 years ago

I am seeing the error in #5100 with 0.167 version of presto

Stacktrace:

java.lang.NullPointerException: maxSize is null
    at java.util.Objects.requireNonNull(Objects.java:228)
    at com.facebook.presto.execution.SqlTaskManager.getTaskResults(SqlTaskManager.java:330)
    at com.facebook.presto.server.TaskResource.getResults(TaskResource.java:243)
    at sun.reflect.GeneratedMethodAccessor550.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
    at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:143)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
    at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
    at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689)
    at io.airlift.http.server.TraceTokenFilter.doFilter(TraceTokenFilter.java:63)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
    at io.airlift.http.server.TimingFilter.doFilter(TimingFilter.java:52)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:462)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1182)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
    at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:169)
    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
    at org.eclipse.jetty.server.Server.handle(Server.java:523)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
    at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
    at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
    at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
    at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
    at java.lang.Thread.run(Thread.java:745)
bond- commented 7 years ago

@dain Opened a new issue. Let me know if you need more details for debugging the issue.

bond- commented 7 years ago

I am no longer seeing this issue after adding the following parameters in jvm.config:

-XX:+UseNUMA
-XX:+AggressiveOpts 

For people wondering about what may have changed, I ran a diff with/without having these options:

--- no_agg.log  2017-03-20 15:39:38.648056871 -0500
+++ agg.log     2017-03-20 15:39:20.461908190 -0500
@@ -12 +12 @@
-     bool AggressiveOpts                           := false                               {product}
+     bool AggressiveOpts                           := true                                {product}
@@ -33 +33 @@
-     intx AutoBoxCacheMax                           = 128                                 {C2 product}
+     intx AutoBoxCacheMax                           = 20000                               {C2 product}
@@ -42 +42 @@
-     intx BiasedLockingStartupDelay                 = 4000                                {product}
+     intx BiasedLockingStartupDelay                 = 500                                 {product}
@@ -370 +370 @@
-    uintx MinHeapDeltaBytes                        := 524288                              {product}
+    uintx MinHeapDeltaBytes                        := 67108864                            {product}
@@ -749,2 +749,2 @@
-     bool UseNUMA                                  := false                               {product}
-     bool UseNUMAInterleaving                       = false                               {product}
+     bool UseNUMA                                  := true                                {product}
+     bool UseNUMAInterleaving                      := true                                {product}

It may be better to capture the NPE and return a meaningful error.

martint commented 7 years ago

What GC are you using? If I remember correctly, UseNUMA doesn't work with G1.

bond- commented 7 years ago

@martint I am using G1, I actually picked up those options from #3876

I've looked around a bit and found this bug: https://bugs.openjdk.java.net/browse/JDK-8046147 and this link: http://openjdk.java.net/jeps/157

If I understand correctly UseNUMA has no effect right. The issue I was facing could have been solved by AggressiveOpts.

nezihyigitbasi commented 7 years ago

I don't understand how this relates to the settings you mention. That method throws a NPE when the X-Presto-Max-Size header is not set on the request, are you using the presto cli or some other client?

bond- commented 7 years ago

@nezihyigitbasi I am using Presto CLI to run the queries. Most of the queries used to fail, before I set the JVM parameters (in Presto Server), shown above. From a user standpoint I do not find a relation between them. Once I set that config, none of my queries failed with the NPE.

FYI, I didn't change anything on the cluster apart from the two parameters.

ksingh7 commented 7 years ago

Hi @bond- @nezihyigitbasi @martint

I am hitting the exact same issue with Presto versions 0.170 and 0.177

Any pointers to fix this, Is much appreciated.

Details:

    com.facebook.presto.operator.PageTransportTimeoutException: Encountered too many errors talking to a worker node. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes. (http://10.102.46.73:8880/v1/task/20170809_075828_00010_upkmw.22.13/results/1/4 - 6 failures, time since last success 60.85s)
    at com.facebook.presto.operator.HttpPageBufferClient$1.onFailure(HttpPageBufferClient.java:383)
    at com.google.common.util.concurrent.Futures$4.run(Futures.java:1123)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException: Total timeout 10000 ms elapsed
    at com.google.common.base.Throwables.propagate(Throwables.java:240)
    at io.airlift.http.client.ResponseHandlerUtils.propagate(ResponseHandlerUtils.java:23)
    at com.facebook.presto.operator.HttpPageBufferClient$PageResponseHandler.handleException(HttpPageBufferClient.java:518)
    at com.facebook.presto.operator.HttpPageBufferClient$PageResponseHandler.handleException(HttpPageBufferClient.java:512)
    at io.airlift.http.client.jetty.JettyHttpClient$JettyResponseFuture.failed(JettyHttpClient.java:875)
    at io.airlift.http.client.jetty.JettyHttpClient$BufferingResponseListener.onComplete(JettyHttpClient.java:1113)
    at org.eclipse.jetty.client.ResponseNotifier.notifyComplete(ResponseNotifier.java:193)
    at org.eclipse.jetty.client.ResponseNotifier.notifyComplete(ResponseNotifier.java:185)
    at org.eclipse.jetty.client.HttpReceiver.terminateResponse(HttpReceiver.java:457)
    at org.eclipse.jetty.client.HttpReceiver.abort(HttpReceiver.java:528)
    at org.eclipse.jetty.client.HttpChannel.abortResponse(HttpChannel.java:129)
    at org.eclipse.jetty.client.HttpChannel.abort(HttpChannel.java:122)
    at org.eclipse.jetty.client.HttpExchange.abort(HttpExchange.java:257)
    at org.eclipse.jetty.client.HttpConversation.abort(HttpConversation.java:141)
    at org.eclipse.jetty.client.HttpRequest.abort(HttpRequest.java:708)
    at org.eclipse.jetty.client.TimeoutCompleteListener.run(TimeoutCompleteListener.java:71)
    ... 7 more
Caused by: java.util.concurrent.TimeoutException: Total timeout 10000 ms elapsed
    ... 8 more
java.lang.NullPointerException: maxSize is null
    at java.util.Objects.requireNonNull(Objects.java:228)
    at com.facebook.presto.execution.SqlTaskManager.getTaskResults(SqlTaskManager.java:330)
    at com.facebook.presto.server.TaskResource.getResults(TaskResource.java:245)
    at sun.reflect.GeneratedMethodAccessor55.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
    at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:143)
    at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
    at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
    at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
    at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
    at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
    at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
    at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
    at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
    at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
    at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
    at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
    at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689)
    at io.airlift.http.server.TraceTokenFilter.doFilter(TraceTokenFilter.java:63)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
    at io.airlift.http.server.TimingFilter.doFilter(TimingFilter.java:52)
    at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
    at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
    at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:462)
    at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1182)
    at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
    at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
    at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
    at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:169)
    at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
    at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
    at org.eclipse.jetty.server.Server.handle(Server.java:523)
    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
    at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
    at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
    at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
    at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
    at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
    at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
    at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
    at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
    at java.lang.Thread.run(Thread.java:748)

@bond- I have also tried -XX:+AggressiveOpts and -XX:+UseNUMA together with -XX:+UseG1GC but queries are still failing with above mentioned errors

-server
-Xmx96G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
-XX:+UseNUMA
-XX:+AggressiveOpts
bond- commented 7 years ago

@ksingh7 Yes we managed to get past it. The problem that I was facing was with the data and its query.

The way Presto works in our case was: Presto -> Hive metastore -> HDFS (ORC formatted data)

  1. Data has to be partitioned(Hive partitions), it shouldn't be a huge file sitting on HDFS.
  2. The query should be optimized to use partitions in the where clause wherever possible. Doing so we select only the required chunk of data and do our thing.
  3. Optimize ORC indexes
  4. Run hive table and column analysis, I remember seeing a property in Presto to utilize this data

There are some cases which require to scan the entire table. IMHO, Presto isn't built to do that, so we relied on Hive to do that. FYI, I did also attend a data engineering event hosted by Facebook. They were also facing some cases where Presto isn't cut to do that, so they rely on other tools. For example: Hive.

ksingh7 commented 7 years ago

@bond- Thanks for your response.

I managed to get this working by adjusting memory limits for both JVM -Xmx and query.max-memory-per-node. The sum of these two should be under physical memory limit. (this is what I was missing)

bond- commented 7 years ago

@ksingh7 Ah!, good to know. maxSize is null error seems to be too generic for a variety of failures. @dain Can we request for better error handling mechanism for the error that @ksingh7 faced?

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.