juicedata / juicefs

JuiceFS is a distributed POSIX file system built on top of Redis and S3.
https://juicefs.com
Apache License 2.0
10.04k stars 888 forks source link

Support Trino #4364

Open teddy-hackle opened 5 months ago

teddy-hackle commented 5 months ago

What would you like to be added:

First.

I referred to the guide for using Trino, but it doesn't actually work, so I put the library in the following location and use it.

curl -L https://github.com/juicedata/juicefs/releases/download/v1.1.1/juicefs-hadoop-1.1.1.jar > /usr/lib/trino/plugin/iceberg/hdfs/juicefs-hadoop-1.1.1.jar && \
cp /usr/lib/trino/plugin/iceberg/hdfs/juicefs-hadoop-1.1.1.jar /usr/lib/trino/plugin/hive/hdfs/

When using Trino and Iceberg, the file must be located in the path above, so document modification is required.

Second.

Currently, the discovery url option in Hadoop sdk is only available in Presto. I want to use that feature in trino as well.

See code below https://github.com/juicedata/juicefs/blob/53c33d37fcd511a53174b69944fe1ed7870de51c/sdk/java/src/main/java/io/juicefs/utils/NodesFetcherBuilder.java

Why is this needed:

Currently, we are using juicefshadoop sdk in Trino and Iceberg environments. In the guide or code, the content for Trino is old or there are parts that are not supported, so we request that feature.

tangyoupeng commented 5 months ago

Currently, if the trino nodes are fixed. You can use a file on jfs and write the hostname of the node into this file line by line

teddy-hackle commented 5 months ago

I currently run Trino on k8s, including Spark. So I haven't benchmarked the performance differences for the discovery-url option yet, but I thought it would be nice if the features were supported, so I created an issue.