trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
9.86k stars 2.85k forks source link

Errors when using dynamic hive catalogs after a worker crash #21093

Open sviscaino opened 3 months ago

sviscaino commented 3 months ago

Reopening an issue from 18040 as per requested by @dain

Trino version: 435-e OS: RHEL Deploy: 1 coordinator, 4 workers

Steps to reproduce: Using dynamic catalog management and a hive catalog. It seems like if one or multiple workers crashes, after it restarts we get the following error when querying the catalog:

io.trino.spi.TrinoException: Unexpected response from http://<ip of a worker>:8080/v1/task/<query id>?summarize
    at io.trino.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:70)
    at io.trino.server.remotetask.SimpleHttpResponseHandler.onSuccess(SimpleHttpResponseHandler.java:27)
    at com.google.common.util.concurrent.Futures$CallbackListener.run(Futures.java:1133)
    at io.airlift.concurrent.BoundedExecutor.drainQueue(BoundedExecutor.java:79)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.IllegalArgumentException: Unable to create class io.trino.execution.TaskInfo from JSON response:
[io.airlift.jaxrs.JsonMapperParsingException: Invalid json for Java type io.trino.server.TaskUpdateRequest
    at io.airlift.jaxrs.AbstractJacksonMapper.readFrom(AbstractJacksonMapper.java:123)
    at io.airlift.jaxrs.JsonMapper.readFrom(JsonMapper.java:41)
    at org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$TerminalReaderInterceptor.invokeReadFrom(ReaderInterceptorExecutor.java:233)
    at org.glassfish.jersey.message.internal.ReaderInterceptorExecutor$TerminalReaderInterceptor.aroundReadFrom(ReaderInterceptorExecutor.java:212)
...
Caused by: com.fasterxml.jackson.databind.JsonMappingException: Unknown handle id: hive:<catalog name>:<uuid>:io.trino.plugin.hive.HiveTableHandle (through reference chain: io.trino.server.TaskUpdateRequest["fragment"]->io.trino.sql.planner.PlanFragment["root"]->io.trino.sql.planner.plan.OutputNode["source"]->io.trino.sql.planner.plan.ProjectNode["source"]->io.trino.sql.planner.plan.TableScanNode["table"]->io.trino.metadata.TableHandle["connectorHandle"])
    at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:402)
    at com.fasterxml.jackson.databind.JsonMappingException.wrapWithPath(JsonMappingException.java:361)
    at com.fasterxml.jackson.databind.deser.BeanDeserializerBase.wrapAndThrow(BeanDeserializerBase.java:1853)
    at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeWithErrorWrapping(BeanDeserializer.java:572)
    at com.fasterxml.jackson.databind.deser.BeanDeserializer._deserializeUsingPropertyBased(BeanDeserializer.java:439)

Sometimes we also get the following:

java.lang.ClassCastException: class io.trino.plugin.hive.HiveInsertTableHandle cannot be cast to class io.trino.plugin.hive.HiveInsertTableHandle (io.trino.plugin.hive.HiveInsertTableHandle is in unnamed module of loader io.trino.server.PluginClassLoader @aeb0954; io.trino.plugin.hive.HiveInsertTableHandle is in unnamed module of loader io.trino.server.PluginClassLoader @6d718b5d)
    at io.trino.plugin.hive.HiveMetadata.finishInsert(HiveMetadata.java:2170)
    at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.finishInsert(ClassLoaderSafeConnectorMetadata.java:625)
    at io.trino.tracing.TracingConnectorMetadata.finishInsert(TracingConnectorMetadata.java:706)
    at io.trino.metadata.MetadataManager.finishInsert(MetadataManager.java:1140)
    at io.trino.tracing.TracingMetadata.finishInsert(TracingMetadata.java:694)
    at io.trino.sql.planner.LocalExecutionPlanner.lambda$createTableFinisher$4(LocalExecutionPlanner.java:4381)
    at io.trino.operator.TableFinishOperator.getOutput(TableFinishOperator.java:319)
...

After dropping and re-creating the catalog, the error persists. I somehow am fixing it manually by killing workers and dropping and re-creating the catalog multiple times, but I'm not sure of the exact steps.

Edit: just saw 18053 - indeed after simply running a DESCRIBE on the table it starts working again, so it's probably the same issue

electrum commented 3 months ago

I notice your Trino version is 435-e. Are you running the Starburst version of Trino? If so, please file a support ticket with Starburst, as the Starburst version is different than Trino OSS.

sviscaino commented 3 months ago

I notice your Trino version is 435-e. Are you running the Starburst version of Trino? If so, please file a support ticket with Starburst, as the Starburst version is different than Trino OSS.

Will do thanks - I had assumed 435-e was based off 435 OSS