Closed bond- closed 5 years ago
@dain Opened a new issue. Let me know if you need more details for debugging the issue.
I am no longer seeing this issue after adding the following parameters in jvm.config:
-XX:+UseNUMA
-XX:+AggressiveOpts
For people wondering about what may have changed, I ran a diff with/without having these options:
--- no_agg.log 2017-03-20 15:39:38.648056871 -0500
+++ agg.log 2017-03-20 15:39:20.461908190 -0500
@@ -12 +12 @@
- bool AggressiveOpts := false {product}
+ bool AggressiveOpts := true {product}
@@ -33 +33 @@
- intx AutoBoxCacheMax = 128 {C2 product}
+ intx AutoBoxCacheMax = 20000 {C2 product}
@@ -42 +42 @@
- intx BiasedLockingStartupDelay = 4000 {product}
+ intx BiasedLockingStartupDelay = 500 {product}
@@ -370 +370 @@
- uintx MinHeapDeltaBytes := 524288 {product}
+ uintx MinHeapDeltaBytes := 67108864 {product}
@@ -749,2 +749,2 @@
- bool UseNUMA := false {product}
- bool UseNUMAInterleaving = false {product}
+ bool UseNUMA := true {product}
+ bool UseNUMAInterleaving := true {product}
It may be better to capture the NPE and return a meaningful error.
What GC are you using? If I remember correctly, UseNUMA doesn't work with G1.
@martint I am using G1, I actually picked up those options from #3876
I've looked around a bit and found this bug: https://bugs.openjdk.java.net/browse/JDK-8046147 and this link: http://openjdk.java.net/jeps/157
If I understand correctly UseNUMA has no effect right. The issue I was facing could have been solved by AggressiveOpts.
I don't understand how this relates to the settings you mention. That method throws a NPE when the X-Presto-Max-Size
header is not set on the request, are you using the presto cli or some other client?
@nezihyigitbasi I am using Presto CLI to run the queries. Most of the queries used to fail, before I set the JVM parameters (in Presto Server), shown above. From a user standpoint I do not find a relation between them. Once I set that config, none of my queries failed with the NPE.
FYI, I didn't change anything on the cluster apart from the two parameters.
Hi @bond- @nezihyigitbasi @martint
I am hitting the exact same issue with Presto versions 0.170
and 0.177
Any pointers to fix this, Is much appreciated.
Details:
0.170
and 0.177
com.facebook.presto.operator.PageTransportTimeoutException: Encountered too many errors talking to a worker node. The node may have crashed or be under too much load. This is probably a transient issue, so please retry your query in a few minutes. (http://10.102.46.73:8880/v1/task/20170809_075828_00010_upkmw.22.13/results/1/4 - 6 failures, time since last success 60.85s)
at com.facebook.presto.operator.HttpPageBufferClient$1.onFailure(HttpPageBufferClient.java:383)
at com.google.common.util.concurrent.Futures$4.run(Futures.java:1123)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.RuntimeException: java.util.concurrent.TimeoutException: Total timeout 10000 ms elapsed
at com.google.common.base.Throwables.propagate(Throwables.java:240)
at io.airlift.http.client.ResponseHandlerUtils.propagate(ResponseHandlerUtils.java:23)
at com.facebook.presto.operator.HttpPageBufferClient$PageResponseHandler.handleException(HttpPageBufferClient.java:518)
at com.facebook.presto.operator.HttpPageBufferClient$PageResponseHandler.handleException(HttpPageBufferClient.java:512)
at io.airlift.http.client.jetty.JettyHttpClient$JettyResponseFuture.failed(JettyHttpClient.java:875)
at io.airlift.http.client.jetty.JettyHttpClient$BufferingResponseListener.onComplete(JettyHttpClient.java:1113)
at org.eclipse.jetty.client.ResponseNotifier.notifyComplete(ResponseNotifier.java:193)
at org.eclipse.jetty.client.ResponseNotifier.notifyComplete(ResponseNotifier.java:185)
at org.eclipse.jetty.client.HttpReceiver.terminateResponse(HttpReceiver.java:457)
at org.eclipse.jetty.client.HttpReceiver.abort(HttpReceiver.java:528)
at org.eclipse.jetty.client.HttpChannel.abortResponse(HttpChannel.java:129)
at org.eclipse.jetty.client.HttpChannel.abort(HttpChannel.java:122)
at org.eclipse.jetty.client.HttpExchange.abort(HttpExchange.java:257)
at org.eclipse.jetty.client.HttpConversation.abort(HttpConversation.java:141)
at org.eclipse.jetty.client.HttpRequest.abort(HttpRequest.java:708)
at org.eclipse.jetty.client.TimeoutCompleteListener.run(TimeoutCompleteListener.java:71)
... 7 more
Caused by: java.util.concurrent.TimeoutException: Total timeout 10000 ms elapsed
... 8 more
java.lang.NullPointerException: maxSize is null
at java.util.Objects.requireNonNull(Objects.java:228)
at com.facebook.presto.execution.SqlTaskManager.getTaskResults(SqlTaskManager.java:330)
at com.facebook.presto.server.TaskResource.getResults(TaskResource.java:245)
at sun.reflect.GeneratedMethodAccessor55.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory$1.invoke(ResourceMethodInvocationHandlerFactory.java:81)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:144)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:161)
at org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$VoidOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:143)
at org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:99)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:389)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:347)
at org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:102)
at org.glassfish.jersey.server.ServerRuntime$2.run(ServerRuntime.java:326)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:271)
at org.glassfish.jersey.internal.Errors$1.call(Errors.java:267)
at org.glassfish.jersey.internal.Errors.process(Errors.java:315)
at org.glassfish.jersey.internal.Errors.process(Errors.java:297)
at org.glassfish.jersey.internal.Errors.process(Errors.java:267)
at org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:317)
at org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:305)
at org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:1154)
at org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:473)
at org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:427)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:388)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:341)
at org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:228)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:845)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1689)
at io.airlift.http.server.TraceTokenFilter.doFilter(TraceTokenFilter.java:63)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
at io.airlift.http.server.TimingFilter.doFilter(TimingFilter.java:52)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1676)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:462)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1182)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at org.eclipse.jetty.server.handler.StatisticsHandler.handle(StatisticsHandler.java:169)
at org.eclipse.jetty.server.handler.HandlerList.handle(HandlerList.java:52)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:523)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)
at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)
at java.lang.Thread.run(Thread.java:748)
@bond- I have also tried -XX:+AggressiveOpts
and -XX:+UseNUMA
together with -XX:+UseG1GC
but queries are still failing with above mentioned errors
-server
-Xmx96G
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+UseGCOverheadLimit
-XX:+ExplicitGCInvokesConcurrent
-XX:+HeapDumpOnOutOfMemoryError
-XX:OnOutOfMemoryError=kill -9 %p
-XX:+UseNUMA
-XX:+AggressiveOpts
@ksingh7 Yes we managed to get past it. The problem that I was facing was with the data and its query.
The way Presto works in our case was: Presto -> Hive metastore -> HDFS (ORC formatted data)
There are some cases which require to scan the entire table. IMHO, Presto isn't built to do that, so we relied on Hive to do that. FYI, I did also attend a data engineering event hosted by Facebook. They were also facing some cases where Presto isn't cut to do that, so they rely on other tools. For example: Hive.
@bond- Thanks for your response.
I managed to get this working by adjusting memory limits for both JVM -Xmx
and query.max-memory-per-node
. The sum of these two should be under physical memory limit. (this is what I was missing)
@ksingh7 Ah!, good to know. maxSize is null error seems to be too generic for a variety of failures. @dain Can we request for better error handling mechanism for the error that @ksingh7 faced?
This issue has been automatically marked as stale because it has not had any activity in the last 2 years. If you feel that this issue is important, just comment and the stale tag will be removed; otherwise it will be closed in 7 days. This is an attempt to ensure that our open issues remain valuable and relevant so that we can keep track of what needs to be done and prioritize the right things.
I am seeing the error in #5100 with 0.167 version of presto
Stacktrace: