First, thank you for maintaining this great OSS product!
We have been encountering a cryptic error when using Trino w/ the Hive connector against an Alluxio location in our test environment.
Specifically, we were able to:
successfully create a schema w/ an alluxio:// location.
successfully create a table in this schema.
successfully insert data into this table.
Running a SELECT query against the table produces the following error: No nodes available to run query.
This is caused by the setting hive.force-local-scheduling=true. We had turned this setting on by mistake and forgot to test w/ Alluxio. After 24 hours of endless attempts at fixing the problem we found this was the issue.
We encountered the problem as early as Trino 373, and it reproduces against Trino 380 (our current version). We also verified the problem persists against Alluxio 2.8.0. The instructions below are checked with Trino 380 and Alluxio 2.7.3 (the same Alluxio version supported by Trino 380).
How to reproduce
High-level
If you already have your own evironment w/ an Alluxio installation and a Hive connector, simply set hive.force-local-scheduling=true in your Hive connector config. This should cause SELECT statements against tables w/ an alluxio:// location to fail with an error reading No nodes available to run query..
Step-by-step
We verified this issue reproduces using Walden, our small data lake test environment deployed on Kubernetes.
Assuming you have a working k8s cluster, you can follow the instructions listed here to set up the test environment.
No bug with the switch turned off
Once the environment is deployed (all the pods are running), follow the instructions listed here to set up a Hive table backed by Alluxio. The instructions are repeated below.
You should first run (via bash from the devserver pod):
mc alias set walden-minio/ http://minio:9000 $MINIO_ACCESS_KEY_ID $MINIO_ACCESS_KEY_SECRET
mc mb walden-minio/alluxio
trino alluxio
Then, the following set of queries should succeed:
CREATE SCHEMA alluxio WITH (location='alluxio://alluxio:19998/');
CREATE TABLE dim_bar(baz BIGINT);
INSERT INTO dim_bar VALUES 4, 5, 6, 7;
Finally, a SELECT statement should return some rows:
Exit the kubernetes pod, and from the walden directory, add the following line to kube/configs/trino_hive.properties:
hive.force-local-scheduling=true
Then run (also from the walden directory):
kubectl delete namespace walden
cd kube && ./deploy.sh values-default.yaml
Wait until all the pods are back in Running state. You can check this by running kubectl get pods -n walden.
Then follow the same instructions as above:
First run (via bash from the devserver pod):
mc alias set walden-minio/ http://minio:9000 $MINIO_ACCESS_KEY_ID $MINIO_ACCESS_KEY_SECRET
mc mb walden-minio/alluxio
trino alluxio
The following set of queries should continue to succeed:
CREATE SCHEMA alluxio WITH (location='alluxio://alluxio:19998/');
CREATE TABLE dim_bar(baz BIGINT);
INSERT INTO dim_bar VALUES 4, 5, 6, 7;
The SELECT query should now fail, however:
SELECT baz FROM dim_bar;
Example failure:
trino:alluxio> SELECT baz FROM dim_bar;
Query 20220508_021052_00005_az83w, FAILED, 1 node
Splits: 1 total, 0 done (0.00%)
0.16 [0 rows, 0B] [0 rows/s, 0B/s]
Query 20220508_021052_00005_az83w failed: No nodes available to run query
Relevant stack trace from the coordinator:
2022-05-08T02:10:53.236Z ERROR query-execution-4 io.trino.execution.scheduler.SqlQueryScheduler Failure in distributed stage for query 20220508_021052_00005_az83w
io.trino.spi.TrinoException: No nodes available to run query
at io.trino.execution.scheduler.UniformNodeSelector.computeAssignments(UniformNodeSelector.java:178)
at io.trino.execution.scheduler.DynamicSplitPlacementPolicy.computeAssignments(DynamicSplitPlacementPolicy.java:41)
at io.trino.execution.scheduler.SourcePartitionedScheduler.schedule(SourcePartitionedScheduler.java:343)
at io.trino.execution.scheduler.SourcePartitionedScheduler$1.schedule(SourcePartitionedScheduler.java:188)
at io.trino.execution.scheduler.SqlQueryScheduler$PipelinedDistributedStagesScheduler.schedule(SqlQueryScheduler.java:1606)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
First, thank you for maintaining this great OSS product!
We have been encountering a cryptic error when using Trino w/ the Hive connector against an Alluxio location in our test environment.
Specifically, we were able to:
alluxio://
location.Running a
SELECT
query against the table produces the following error:No nodes available to run query.
This is caused by the setting
hive.force-local-scheduling=true
. We had turned this setting on by mistake and forgot to test w/ Alluxio. After 24 hours of endless attempts at fixing the problem we found this was the issue.We encountered the problem as early as Trino 373, and it reproduces against Trino 380 (our current version). We also verified the problem persists against Alluxio 2.8.0. The instructions below are checked with Trino 380 and Alluxio 2.7.3 (the same Alluxio version supported by Trino 380).
How to reproduce
High-level
If you already have your own evironment w/ an Alluxio installation and a Hive connector, simply set
hive.force-local-scheduling=true
in your Hive connector config. This should causeSELECT
statements against tables w/ analluxio://
location to fail with an error readingNo nodes available to run query.
.Step-by-step
We verified this issue reproduces using Walden, our small data lake test environment deployed on Kubernetes.
Assuming you have a working k8s cluster, you can follow the instructions listed here to set up the test environment.
No bug with the switch turned off
Once the environment is deployed (all the pods are running), follow the instructions listed here to set up a Hive table backed by Alluxio. The instructions are repeated below.
You should first run (via bash from the
devserver
pod):Then, the following set of queries should succeed:
Finally, a
SELECT
statement should return some rows:Inserting and verifying the bug
Exit the kubernetes pod, and from the
walden
directory, add the following line tokube/configs/trino_hive.properties
:Then run (also from the
walden
directory):Wait until all the pods are back in
Running
state. You can check this by runningkubectl get pods -n walden
.Then follow the same instructions as above:
First run (via bash from the
devserver
pod):The following set of queries should continue to succeed:
The
SELECT
query should now fail, however:Example failure:
Relevant stack trace from the coordinator: