Open tooptoop4 opened 4 years ago
Currently Rubix (assuming you're asking about hive cache) does not have awareness of schema and table, it works on file paths. Rubix has the below configs to either exclude specific locations from cache or include only certain locations in cache.
rubix.cache.location.blacklist
rubix.cache.location.whitelist
We can provide regex strings here. These configs have to be setup in the hive.config.resources
file.
We can do table whitelisting/blacklisting by looking at HdfsContext
at io.prestosql.plugin.hive.rubix.RubixConfigurationInitializer#updateConfiguration
and enabling/disabling Rubix accordingly. HdfsContext
contains table and schema information
this would be great because some tables are too large(expensive) to cache on local disk but smaller tables would benefit from the cache
@raunaqmorarka does rubix.cache.location.whitelist already exist in presto ?
@raunaqmorarka does rubix.cache.location.whitelist already exist in presto ?
rubix.cache.location.blacklist
and rubix.cache.location.whitelist
are Rubix configs which need to be specified via the hive.config.resources
file in Presto.
are there any other rubix settings to customize cache behavior? Like disk percentage usage, TTL and other settings available at catalog level?
Thanks!
@damianmontenegro-upwork these properties are available via hive.config.resources
file in Trino. See https://rubix.readthedocs.io/en/latest/configuration.html
Hi @sopel39 , thanks for your answer. In the link you provide, the settings mentioned in this thread are not listed (rubix.cache.location.blacklist and rubix.cache.location.whitelist). So, here are three more quick questions:
Thanks in advance :)
cc @raunaqmorarka
Is https://rubix.readthedocs.io/en/latest/configuration.html up to date?
There are still some undocumented configs (like the blacklist and whitelist ones you've mentioned). But most of the important ones are covered.
All settings in https://rubix.readthedocs.io/en/latest/configuration.html are available in Presto?
Any Rubix config can be given to Trino through hive.config.resources
file.
Are there any other settings available in Presto not listed in https://rubix.readthedocs.io/en/latest/configuration.html?
Trino configs for Rubix are documented at https://trino.io/docs/current/connector/hive-caching.html These are a subset of the configs from https://rubix.readthedocs.io/en/latest/configuration.html , renamed and provided "natively" in Trino configs for convenience as these tend to be the most used configs.
would be great if a list of schemas/tables could be defined in config of which caching is enabled on, instead of enabling caching on an entire catalog