trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.41k stars 3k forks source link

Dynamic Catalogs #12709

Open dain opened 2 years ago

dain commented 2 years ago

This is an umbrella tracking issue for the Dynamic Catalogs project. As we progress on this project, PRs and sub-issues will be linked back to this issue.

Goal

Add support for adding and removing catalogs without restarting servers or interrupting existing queries. This will be dependent on the connector being updated to support concurrent versions with same name (e.g., no JMX names containing just a catalog name). For non-compliant connectors, support a basic model where queries are killed when dropping.

For catalog properties the current plan is to simply use the existing Map<String, String> of arbitrary properties instead of requiring pre-declaration of typed catalog properties like is done for session and table properties. This is mainly for expedience and backwards compatibility of the existing connector configuration system.

MVP Plan

Follow-up Items

dain commented 2 years ago

I have a prototype if the basic stuff working, and an working on cleaning this up for submission.

Unclecc commented 2 years ago

Hi, I use this PR to test DynamicCatalog, However, I ran into a problem when I deleted and then created a Hive Catalog. like reload, the first time is OK and can execute SQL, the second time will have the following error occurred when i execute SQL. java.lang.ClassCastException: class io.trino.plugin.hive.HiveTableHandle cannot be cast to class io.trino.plugin.hive.HiveTableHandle (io.trino.plugin.hive.HiveTableHandle is in unnamed module of loader io.trino.server.PluginClassLoader @408b024e; io.trino.plugin.hive.HiveTableHandle is in unnamed module of loader io.trino.server.PluginClassLoader @aa6aca3) at io.trino.plugin.hive.HivePageSourceProvider.createPageSource(HivePageSourceProvider.java:150) at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorPageSourceProvider.createPageSource(ClassLoaderSafeConnectorPageSourceProvider.java:49) at io.trino.split.PageSourceManager.createPageSource(PageSourceManager.java:62) at io.trino.operator.TableScanOperator.getOutput(TableScanOperator.java:308) at io.trino.operator.Driver.processInternal(Driver.java:411) at io.trino.operator.Driver.lambda$process$10(Driver.java:314) at io.trino.operator.Driver.tryWithLock(Driver.java:706) at io.trino.operator.Driver.process(Driver.java:306) at io.trino.operator.Driver.processForDuration(Driver.java:277) at io.trino.execution.SqlTaskExecution$DriverSplitRunner.processFor(SqlTaskExecution.java:739) at io.trino.execution.executor.PrioritizedSplitRunner.process(PrioritizedSplitRunner.java:164) at io.trino.execution.executor.TaskExecutor$TaskRunner.run(TaskExecutor.java:515) at io.trino.$gen.Trino_400_100_gc6901d5_dirty____20221025_085036_2.run(Unknown Source) at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) at java.base/java.lang.Thread.run(Thread.java:833)

so i take two days to read trino code, i found this commit https://github.com/trinodb/trino/pull/1820, i think @dain will face this problem in worker too, because it will generate a new classloader when you update hive catalog, but io.trino.plugin.hive.HiveTableHandle's instance loaded last time

leeyh0216 commented 1 year ago

Hi, @dain , @Unclecc .

I am also interested in Dynamic Catalog, so I am trying to implement my own based on the PR. The above issue that @Unclecc mentioned was reproduced in the same way, but I thought that the issue you mentioned is not the problem.

I solved the problem by slightly modifying AbstractTypedJacksonModule.InternalTypeDeserializer . Since the AsPropertyTypeDeserializer object caches the class loader for the class parsed once, the second call process does not use the class loader changed during the Dynamic Catalog application process, but the existing class loader is used. Since it was for simple test purposes, I changed the method to create an AsPropertyTypeDeserializer object every time as shown below, and confirmed that it works.

    private static class InternalTypeDeserializer<T>
            extends StdDeserializer<T>
    {
        private TypeIdResolver typeIdResolver;

        private Class<T> baseClass;
        public InternalTypeDeserializer(Class<T> baseClass, TypeIdResolver typeIdResolver)
        {
            super(baseClass);
            this.typeIdResolver = typeIdResolver;
            this.baseClass = baseClass;
        }

        @SuppressWarnings("unchecked")
        @Override
        public T deserialize(JsonParser jsonParser, DeserializationContext deserializationContext)
                throws IOException
        {
            TypeDeserializer typeDeserializer = new AsPropertyTypeDeserializer(
                    TypeFactory.defaultInstance().constructType(baseClass),
                    typeIdResolver,
                    TYPE_PROPERTY,
                    false,
                    null);
            return (T) typeDeserializer.deserializeTypedFromAny(jsonParser, deserializationContext);
        }
    }

@Unclecc , If you apply the above code and test it, I think it will work. @dain , If you have the same problem, I hope the above solution helps.

I'm not good at English... Please understand if the context is strange.

Gqyanxin commented 1 year ago

Hi, I am also interested in Dynamic Catalog.When will this feature be available?

bitsondatadev commented 1 year ago

Hey @Gqyanxin and anyone else that stumbles upon this issue. Keep in mind that this is an open-source project and some of this work has to happen during the developers' free time.

The best you can do is follow this issue and the related to get a sense of how close they are.

To get a sense of why this issue may take some time, see the discussion on this during the Trino Summit. If Dain has an update on when to expect this, he will provide them here.

cdrappier commented 1 year ago

Hi,

At ForePaaS, we are currently testing this new feature, which is proving to be very helpful for our use case. I have start some work on a JdbcCatalogStore inspirated by the FileCatalogStore. If this is something that is relevant to the main Trino open source project, I will create a merge request as soon as possible to discuss the subject with you. Is that something referring to you follow-up items : SPI for catalog storage ?

My goal is to have a dynamic catalog, with persistant data, but without directly using disk. Thanks,

bitsondatadev commented 1 year ago

@dain ^^

Smith-Cruise commented 1 year ago

What's grammar now, I want to have a test for it.

ockhamlabs commented 1 year ago

@dain Any update on when the pending items for MVP will be done? We have been eagerly waiting for this to be finished for a usable dynamic catalog capability. Pl let us know

hashhar commented 1 year ago

There is related work ongoing in https://github.com/trinodb/trino/issues/15921 - without that while you can have dynamic catalogs one of the most important connectors - Hive (and Iceberg) won't be able to be used with it.

xuhaiL commented 1 year ago

i need this feature,it can be used?

dungdm93 commented 1 year ago

It's useful to implementation k8s CRD based catalogs. And further more is Trino k8s operator.

gmartini2019 commented 11 months ago

Hey, did you guys figure out the dynamic catalog? I am trying to make an API just for that, but I am missing the specific user permission that will allow me to do so.

hashhar commented 3 months ago

@dain A question, the CatalogVersion javadoc says:

/**
 * Version of a catalog.  The string maybe compared lexicographically using ASCII, and to determine which catalog version is newer.
 */

Right now none of the implementations actually adhere to this though and nor do we use CatalogVersion#compareTo anywhere.

e.g. in FileCatalogStore#computeCatalogVersion we break the contract by returning a length-prefixed hash of catalog name, connector name and properties. We also have a javadoc there which says:

/**
 * This is not a generic, universal, or stable version computation, and can and will change from version to version without warning.
 * For places that need a long term stable version, do not use this code.
 */

This is interesting because the CatalogStore interface always returns the "latest" version of the catalog (because concurrency is being limited in CatalogDynamicCatalogManager usign the catalogsUpdateLock which means the CatalogVersion is of no use for doing things like optimistic concurrency control in a database-backed catalog store for example.

This is confusing. Can you please clarify whether CatalogVersion is something which the CatalogStore implementation should care about? If not maybe we should move that computation to the CatalogManager.

Also lot of SPIs around pluggable catalog stores are marked @Experimental at the moment. Do you think it's time to promote them to be stable?

Unclecc commented 3 months ago

您的邮件已经收到。我会尽快处理。

hashhar commented 3 months ago

In an offline discussion we realised that the CatalogVersion contract doesn't need to define lexicographic ordering because in current implementation the engine or catalog store never needs a sorting of catalog versions - only the latest catalog version and using the catalog version to disambiguate the catalog handles for the same connector with same catalog name.

I'll send a PR with the javadoc updates and some code cleanup.