trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
9.94k stars 2.87k forks source link

Row type having no field throw TypeNotFoundException in MongoDB #2162

Open ebyhr opened 4 years ago

ebyhr commented 4 years ago

If the first document read by Presto as below,

{"_id":"5de34625245afc5be2acbac5","a":{"b":{}}}

the generated definition in _schema will be like this and it will throw the following exception during accessing the table

{ 
   "_id":"5de34642221afc1ac0b386ff",
   "table":"test2",
   "fields":[ 
      { 
         "name":"_id",
         "type":"ObjectId",
         "hidden":true
      },
      { 
         "name":"a",
         "type":"row(\"b\" row)",
         "hidden":false
      }
   ]
}
io.prestosql.spi.type.TypeNotFoundException: Unknown type: row
    at io.prestosql.metadata.TypeRegistry.instantiateParametricType(TypeRegistry.java:193)
    at io.prestosql.metadata.TypeRegistry.lambda$getType$0(TypeRegistry.java:153)
    at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4876)
    at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528)
    at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277)
    at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
    at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
    at com.google.common.cache.LocalCache.get(LocalCache.java:3952)
    at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4871)
    at io.prestosql.metadata.TypeRegistry.getType(TypeRegistry.java:153)
    at io.prestosql.metadata.MetadataManager.getType(MetadataManager.java:1217)
    at io.prestosql.type.InternalTypeManager.getType(InternalTypeManager.java:52)
    at io.prestosql.spi.type.TypeParameter.of(TypeParameter.java:61)
    at io.prestosql.metadata.TypeRegistry.instantiateParametricType(TypeRegistry.java:179)
    at io.prestosql.metadata.TypeRegistry.lambda$getType$0(TypeRegistry.java:153)
    at com.google.common.cache.LocalCache$LocalManualCache$1.load(LocalCache.java:4876)
    at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528)
    at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277)
    at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
    at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
    at com.google.common.cache.LocalCache.get(LocalCache.java:3952)
    at com.google.common.cache.LocalCache$LocalManualCache.get(LocalCache.java:4871)
    at io.prestosql.metadata.TypeRegistry.getType(TypeRegistry.java:153)
    at io.prestosql.metadata.TypeRegistry.fromSqlType(TypeRegistry.java:171)
    at io.prestosql.metadata.MetadataManager.fromSqlType(MetadataManager.java:1223)
    at io.prestosql.type.InternalTypeManager.fromSqlType(InternalTypeManager.java:58)
    at io.prestosql.plugin.mongodb.MongoSession.buildColumnHandle(MongoSession.java:200)
    at io.prestosql.plugin.mongodb.MongoSession.loadTableSchema(MongoSession.java:185)
    at com.google.common.cache.CacheLoader$FunctionToCacheLoader.load(CacheLoader.java:165)
    at com.google.common.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3528)
    at com.google.common.cache.LocalCache$Segment.loadSync(LocalCache.java:2277)
    at com.google.common.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2154)
    at com.google.common.cache.LocalCache$Segment.get(LocalCache.java:2044)
    at com.google.common.cache.LocalCache.get(LocalCache.java:3952)
    at com.google.common.cache.LocalCache.getOrLoad(LocalCache.java:3974)
    at com.google.common.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4958)
    at com.google.common.cache.LocalCache$LocalLoadingCache.getUnchecked(LocalCache.java:4964)
    at io.prestosql.plugin.mongodb.MongoSession.getTable(MongoSession.java:154)
    at io.prestosql.plugin.mongodb.MongoMetadata.getTableMetadata(MongoMetadata.java:287)
    at io.prestosql.plugin.mongodb.MongoMetadata.listTableColumns(MongoMetadata.java:130)
    at io.prestosql.metadata.MetadataManager.listTableColumns(MetadataManager.java:566)
    at io.prestosql.metadata.MetadataListing.listTableColumns(MetadataListing.java:93)
    at io.prestosql.connector.informationschema.InformationSchemaPageSource.addColumnsRecords(InformationSchemaPageSource.java:237)
    at io.prestosql.connector.informationschema.InformationSchemaPageSource.buildPages(InformationSchemaPageSource.java:205)
    at io.prestosql.connector.informationschema.InformationSchemaPageSource.getNextPage(InformationSchemaPageSource.java:171)
    at io.prestosql.operator.TableScanOperator.getOutput(TableScanOperator.java:287)
    at io.prestosql.operator.Driver.processInternal(Driver.java:379)
    at io.prestosql.operator.Driver.lambda$processFor$8(Driver.java:283)
    at io.prestosql.operator.Driver.tryWithLock(Driver.java:675)
    at io.prestosql.operator.Driver.processFor(Driver.java:276)
    at io.prestosql.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:1075)
    at io.prestosql.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:163)
    at io.prestosql.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:484)
    at io.prestosql.$gen.Presto_null__testversion____20191201_044506_2.run(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Row type must have at least one parameter
    at com.google.common.base.Preconditions.checkArgument(Preconditions.java:141)
    at io.prestosql.type.RowParametricType.createType(RowParametricType.java:51)
    at io.prestosql.metadata.TypeRegistry.instantiateParametricType(TypeRegistry.java:190)
    ... 56 more

Of course, we can fix this issue by updating _schema manually, but such a broken definition should not be created or should be more friendly message.

sajjoseph commented 4 years ago

What could be the fix for this issue (other than adjusting _schema registry)? We face this issue with bunch of our mongo tables.

findepi commented 4 years ago

Would it be enough to hide such a column from user? (As we do with unsupported columns in eg JDBC connectors)

sajjoseph commented 4 years ago

For now, I tried the following approach (catch TypeNotFoundException and force the type to be VARCHAR) and it worked for the few tables I checked. I can submit a PR and see if you all like that approach.

Let me know.

kokosing commented 4 years ago

Would it be enough to hide such a column from user? (As we do with unsupported columns in eg JDBC connectors)

Agree that should be default behaviour. In JDBC connectors there are two ways to convert unsupported type to unbounded varchar. One is to use jdbc-types-mapped-to-varchar=JSON in config properties, other is to use unsupported-type-handling=CONVERT_TO_VARCHAR in config properties (or by unsupported_type_handling catalog session property).

I would be really happy to do a review of your patch, but please follow the same convention as we already have for JDBC connectors.

sajjoseph commented 4 years ago

@kokosing - In JDBC, there are well established classes like TypeHandlingJdbcConfig.java, UnsupportedTypeHandling.java and TypeHandlingJdbcPropertiesProvider.java that handles unsupported types. We need similar classes for MongoDB connector.

Is it possible to move them to say SPI so that it is available to all connectors (including non-JDBC ones like MongoDB) or do you recommend us duplicate the code (the above classes for example) in MongoDB connector?

kokosing commented 4 years ago

@sajjoseph You could move them to presto-plugin-toolkit, however I am not sure that they will match the use mongodb use case. So for now I would suggest to you to just copy them. Thanks to that you could introduce support for handling unsupported types iteratively. Forcing data type to be converted to varchar should be the first choice.