trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.25k stars 2.95k forks source link

Apply caching to certain schemas/tables only #5336

Open tooptoop4 opened 4 years ago

tooptoop4 commented 4 years ago

would be great if a list of schemas/tables could be defined in config of which caching is enabled on, instead of enabling caching on an entire catalog

raunaqmorarka commented 4 years ago

Currently Rubix (assuming you're asking about hive cache) does not have awareness of schema and table, it works on file paths. Rubix has the below configs to either exclude specific locations from cache or include only certain locations in cache.

rubix.cache.location.blacklist
rubix.cache.location.whitelist

We can provide regex strings here. These configs have to be setup in the hive.config.resources file.

sopel39 commented 4 years ago

We can do table whitelisting/blacklisting by looking at HdfsContext at io.prestosql.plugin.hive.rubix.RubixConfigurationInitializer#updateConfiguration and enabling/disabling Rubix accordingly. HdfsContext contains table and schema information

tooptoop4 commented 4 years ago

this would be great because some tables are too large(expensive) to cache on local disk but smaller tables would benefit from the cache

tooptoop4 commented 3 years ago

@raunaqmorarka does rubix.cache.location.whitelist already exist in presto ?

raunaqmorarka commented 3 years ago

@raunaqmorarka does rubix.cache.location.whitelist already exist in presto ?

rubix.cache.location.blacklist and rubix.cache.location.whitelist are Rubix configs which need to be specified via the hive.config.resources file in Presto.

damianmontenegro-upwork commented 3 years ago

are there any other rubix settings to customize cache behavior? Like disk percentage usage, TTL and other settings available at catalog level?

Thanks!

sopel39 commented 3 years ago

@damianmontenegro-upwork these properties are available via hive.config.resources file in Trino. See https://rubix.readthedocs.io/en/latest/configuration.html

damianmontenegro-upwork commented 3 years ago

Hi @sopel39 , thanks for your answer. In the link you provide, the settings mentioned in this thread are not listed (rubix.cache.location.blacklist and rubix.cache.location.whitelist). So, here are three more quick questions:

Thanks in advance :)

sopel39 commented 3 years ago

cc @raunaqmorarka

raunaqmorarka commented 3 years ago

Is https://rubix.readthedocs.io/en/latest/configuration.html up to date?

There are still some undocumented configs (like the blacklist and whitelist ones you've mentioned). But most of the important ones are covered.

All settings in https://rubix.readthedocs.io/en/latest/configuration.html are available in Presto?

Any Rubix config can be given to Trino through hive.config.resources file.

Are there any other settings available in Presto not listed in https://rubix.readthedocs.io/en/latest/configuration.html?

Trino configs for Rubix are documented at https://trino.io/docs/current/connector/hive-caching.html These are a subset of the configs from https://rubix.readthedocs.io/en/latest/configuration.html , renamed and provided "natively" in Trino configs for convenience as these tend to be the most used configs.