trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.46k stars 3.01k forks source link

After enable hive catalog cache on EMR Trino cluster, primary node cannot talk with core nodes correctly #18428

Closed xiaoshan1213 closed 8 months ago

xiaoshan1213 commented 1 year ago

Hi

I am following this doc https://trino.io/docs/current/connector/hive-caching.html to enable hive catalog cache on AWS EMR, after adding

hive.cache.enabled=true
hive.cache.location=/var/lib/trino/cache1

to the hive connector properties, and spin up the EMR cluster, the command from trino-cli show catalogs timed out with error Insufficient active worker nodes. Waited 5.00m for at least 1 workers, but only 0 workers are active, without the setting, the connection works fine from the server log

                       Percentage of parquet files to validate after write by re-reading the whole file
2023-07-26T18:31:20.139Z        INFO    main    Bootstrap       hive.cache.bookkeeper-port                                       8899                 8899
2023-07-26T18:31:20.139Z        INFO    main    Bootstrap       hive.cache.location                                              ----                 /var/lib/trino/cache1
2023-07-26T18:31:20.139Z        INFO    main    Bootstrap       hive.cache.ttl                                                   7.00d                7.00d                       Time files will be kept in cache prior to eviction
2023-07-26T18:31:20.139Z        INFO    main    Bootstrap       hive.cache.data-transfer-port                                    8898                 8898
2023-07-26T18:31:20.139Z        INFO    main    Bootstrap       hive.cache.disk-usage-percentage                                 80                   80                       Percentage of disk space used for cached data
2023-07-26T18:31:20.139Z        INFO    main    Bootstrap       hive.cache.stale-fileinfo-expiry-period                          36000                36000
2023-07-26T18:31:20.139Z        INFO    main    Bootstrap       hive.cache.file-staleness-check-enable                           true                 true
2023-07-26T18:31:20.139Z        INFO    main    Bootstrap       hive.cache.read-mode                                             ASYNC                ASYNC
2023-07-26T18:31:20.139Z        INFO    main    Bootstrap       hive.cache.start-server-on-coordinator                           false                false
2023-07-26T18:31:20.139Z        INFO    main    Bootstrap       hive.cache.enabled                                               false                true                       Experimental: Cache HDFS file segments to distributed local storage
2023-07-26T18:31:20.139Z        INFO    main    Bootstrap       hive.s3-file-system-type                                         TRINO                EMRFS

I do see the cache is bootstrapped correctly

any thoughts?

raunaqmorarka commented 8 months ago

The previous hive caching implementation has been replaced by a new one https://trino.io/docs/current/object-storage/file-system-cache.html