trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.51k stars 3.03k forks source link

trino access Aliyun OSS got error: 'failed: No factory for location: oss://bucket-name/path' #23740

Open XiangyuFan17 opened 1 month ago

XiangyuFan17 commented 1 month ago

Hello there, trying to using Trino to access data warehouse setup based on Aliyun OSS Trino was deployed on kuburnetes and the hive.properties part in catalog.yaml was edited like below

connector.name=hive
hive.metastore.uri=thrift://${thrift-server-host}:30083
fs.native-s3.enabled=true
fs.hadoop.enabled=false
s3.endpoint=http://oss-cn-shanghai.aliyuncs.com
s3.region=oss-cn-shanghai
s3.aws-access-key=${my key}
s3.aws-secret-key=${my key}

while executing show tables is working great and when i tried to get data from specific table, something went wrong

Query 20241010_103922_00003_dt8h9 failed: No factory for location:

WeChatWorkScreenshot_06446aad-4b80-45d6-b261-6327abcb6b30

the way Trino interact with OSS should be the same as AWS S3, anyone please help, thanks a lot

rvishureddy commented 1 month ago

I have the same issue on Trino 462 version but with S3

Caused by: java.lang.IllegalArgumentException: No factory for location: s3a://Bucket NAME/metadata/13665-29244414-57ea-433d-a4c2-76d6fe0c48a2.metadata.json

catalogs:
  hive: |
    connector.name=hive
    fs.native-s3.enabled=true
    hive.metastore.uri=thrift://hive.spark:9083
    hive.non-managed-table-writes-enabled=true
    hive.max-partitions-per-writers=500
    hive.orc.time-zone=UTC
    hive.parquet.time-zone=UTC
    hive.rcfile.time-zone=UTC
    hive.orc.bloom-filters.enabled=true
    hive.metastore.thrift.client.connect-timeout=2000s
    hive.metastore.thrift.client.read-timeout=2000s

Error Type : INTERNAL_ERROR Error Code : GENERIC_INTERNAL_ERROR (65536) Stack Trace :

 io.trino.spi.TrinoException: Error processing metadata for table <tablename>
    at io.trino.plugin.iceberg.IcebergExceptions.translateMetadataException(IcebergExceptions.java:54)
        at io.trino.plugin.iceberg.catalog.AbstractIcebergTableOperations.refreshFromMetadataLocation(AbstractIcebergTableOperations.java:272)
    at io.trino.plugin.iceberg.catalog.AbstractIcebergTableOperations.refreshFromMetadataLocation(AbstractIcebergTableOperations.java:239)
    at io.trino.plugin.iceberg.catalog.AbstractIcebergTableOperations.refresh(AbstractIcebergTableOperations.java:140)
    at io.trino.plugin.iceberg.catalog.AbstractIcebergTableOperations.current(AbstractIcebergTableOperations.java:123)
    at io.trino.plugin.iceberg.catalog.hms.TrinoHiveCatalog.lambda$loadTable$11(TrinoHiveCatalog.java:448)
    at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4903)
    at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3574)
    at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2316)
    at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2189)
    at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2079)
    at com.google.common.cache.LocalCache.get(LocalCache.java:4017)
    at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4898)
    at io.trino.cache.EvictableCache.get(EvictableCache.java:118)
    at io.trino.cache.CacheUtils.uncheckedCacheGet(CacheUtils.java:39)
    at io.trino.plugin.iceberg.catalog.hms.TrinoHiveCatalog.loadTable(TrinoHiveCatalog.java:445)
    at io.trino.plugin.iceberg.IcebergMetadata.getTableHandle(IcebergMetadata.java:472)
    at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.getTableHandle(ClassLoaderSafeConnectorMetadata.java:1237)
    at io.trino.tracing.TracingConnectorMetadata.getTableHandle(TracingConnectorMetadata.java:142)
    at io.trino.metadata.MetadataManager.lambda$getTableHandle$5(MetadataManager.java:293)
    at java.base/java.util.Optional.flatMap(Optional.java:289)
    at io.trino.metadata.MetadataManager.getTableHandle(MetadataManager.java:284)
    at io.trino.metadata.MetadataManager.getRedirectionAwareTableHandle(MetadataManager.java:1947)
    at io.trino.metadata.MetadataManager.getRedirectionAwareTableHandle(MetadataManager.java:1939)
    at io.trino.tracing.TracingMetadata.getRedirectionAwareTableHandle(TracingMetadata.java:1494)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.getTableHandle(StatementAnalyzer.java:5842)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:2291)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitTable(StatementAnalyzer.java:520)
    at io.trino.sql.tree.Table.accept(Table.java:60)
    at io.trino.sql.tree.AstVisitor.process(AstVisitor.java:27)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:539)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.analyzeFrom(StatementAnalyzer.java:4891)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:3091)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitQuerySpecification(StatementAnalyzer.java:520)
    at io.trino.sql.tree.QuerySpecification.accept(QuerySpecification.java:155)
    at io.trino.sql.tree.AstVisitor.process(AstVisitor.java:27)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:539)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:547)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:1562)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.visitQuery(StatementAnalyzer.java:520)
    at io.trino.sql.tree.Query.accept(Query.java:119)
    at io.trino.sql.tree.AstVisitor.process(AstVisitor.java:27)
    at io.trino.sql.analyzer.StatementAnalyzer$Visitor.process(StatementAnalyzer.java:539)
    at io.trino.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:499)
    at io.trino.sql.analyzer.StatementAnalyzer.analyze(StatementAnalyzer.java:488)
    at io.trino.sql.analyzer.Analyzer.analyze(Analyzer.java:98)
    at io.trino.sql.analyzer.Analyzer.analyze(Analyzer.java:87)
    at io.trino.execution.SqlQueryExecution.analyze(SqlQueryExecution.java:289)
    at io.trino.execution.SqlQueryExecution.<init>(SqlQueryExecution.java:222)
    at io.trino.execution.SqlQueryExecution$SqlQueryExecutionFactory.createQueryExecution(SqlQueryExecution.java:892)
    at io.trino.dispatcher.LocalDispatchQueryFactory.lambda$createDispatchQuery$0(LocalDispatchQueryFactory.java:153)
    at io.trino.$gen.Trino_462____20241021_184613_2.call(Unknown Source)
    at com.google.common.util.concurrent.TrustedListenableFutureTask$TrustedFutureInterruptibleTask.runInterruptibly(TrustedListenableFutureTask.java:131)
    at com.google.common.util.concurrent.InterruptibleTask.run(InterruptibleTask.java:76)
    at com.google.common.util.concurrent.TrustedListenableFutureTask.run(TrustedListenableFutureTask.java:82)
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
    at java.base/java.lang.Thread.run(Thread.java:1575)
Caused by: java.lang.IllegalArgumentException: No factory for location: s3a://My Bucket NAME/metadata/13665-29244414-57ea-433d-a4c2-76d6fe0c48a2.metadata.json
    at io.trino.filesystem.manager.FileSystemModule.lambda$createFileSystemFactory$2(FileSystemModule.java:149)
    at java.base/java.util.Optional.orElseThrow(Optional.java:403)
    at io.trino.filesystem.manager.FileSystemModule.lambda$createFileSystemFactory$3(FileSystemModule.java:149)
    at io.trino.filesystem.switching.SwitchingFileSystem.fileSystem(SwitchingFileSystem.java:194)
    at io.trino.filesystem.switching.SwitchingFileSystem.newInputFile(SwitchingFileSystem.java:60)
    at io.trino.filesystem.tracing.TracingFileSystem.newInputFile(TracingFileSystem.java:51)
    at io.trino.filesystem.cache.CacheFileSystem.newInputFile(CacheFileSystem.java:49)
    at io.trino.plugin.iceberg.fileio.ForwardingFileIo.newInputFile(ForwardingFileIo.java:60)
    at io.trino.plugin.iceberg.catalog.AbstractIcebergTableOperations.lambda$refreshFromMetadataLocation$1(AbstractIcebergTableOperations.java:241)
    at io.trino.plugin.iceberg.catalog.AbstractIcebergTableOperations.lambda$refreshFromMetadataLocation$3(AbstractIcebergTableOperations.java:266)
    at dev.failsafe.Functions.lambda$toCtxSupplier$11(Functions.java:243)
    at dev.failsafe.Functions.lambda$get$0(Functions.java:46)
    at dev.failsafe.internal.RetryPolicyExecutor.lambda$apply$0(RetryPolicyExecutor.java:74)
    at dev.failsafe.SyncExecutionImpl.executeSync(SyncExecutionImpl.java:187)
    at dev.failsafe.FailsafeExecutor.call(FailsafeExecutor.java:376)
    at dev.failsafe.FailsafeExecutor.get(FailsafeExecutor.java:112)
    at io.trino.plugin.iceberg.catalog.AbstractIcebergTableOperations.refreshFromMetadataLocation(AbstractIcebergTableOperations.java:266)
rvishureddy commented 1 month ago

Got this working @XiangyuFan17

At a minimum, each Delta Lake, Hive or Hudi object storage catalog file must set the hive.metastore configuration property to define the type of metastore to use. Iceberg catalogs instead use the iceberg.catalog.type configuration property to define the type of metastore to use.

Go through this carefully : https://trino.io/docs/current/object-storage/metastores.html#hive-thrift-metastore

For hive

  hive: |
    connector.name=hive
    hive.metastore=thrift
    fs.native-s3.enabled=true
    hive.metastore.uri=thrift://hive.spark:9083

For me The below configuration worked

  iceberg: |
    connector.name=iceberg
    fs.native-s3.enabled=true
    iceberg.catalog.type=hive_metastore
    hive.metastore.uri=thrift://hive.spark:9083
    iceberg.file-format=orc
    iceberg.compression-codec=zstd
    hive.orc.bloom-filters.enabled=true
rvishureddy commented 1 month ago

If you are using thrift protocol do as above

But if you are using http or https read the respective Sections in the above link provided.

sar009 commented 1 month ago

I see a similar error with the Iceberg connector with the rest and glue catalog. I used the following config for glue

connector.name=iceberg
iceberg.catalog.type=glue
iceberg.file-format=parquet
hive.metastore.glue.region=us-east-1
hive.metastore.glue.default-warehouse-dir=s3://mybucket/test/
hive.metastore.glue.aws-access-key=abcd
hive.metastore.glue.aws-secret-key=abcd

and following for rest

connector.name=iceberg
iceberg.catalog.type=rest
iceberg.rest-catalog.uri=http://rest:8181/

the error is

No factory for location: s3://mybucket/test/taxis-bec5eb7e34844c76ad34d2c87558813f

I tried various versions and see the problem started at version 458