ytsaurus / ytsaurus

YTsaurus is a scalable and fault-tolerant open-source big data platform.
https://ytsaurus.tech
Apache License 2.0
1.87k stars 132 forks source link

[Feature] Implement S3/HDFS Integrations for CHYT in Ytsaurus #29

Open kaikash opened 1 year ago

kaikash commented 1 year ago

Clickhouse offers valuable integrations like S3, HDFS, and others. Adding support for these integrations in CHYT within Ytsaurus would greatly enhance convenience and functionality.

kaikash commented 1 year ago

Any updates here?

DimasKovas commented 1 year ago

Any updates here?

We'll take a look at this task next week.

Could you, please, specify the exact functionality you need?

We are talking about s3 and hdfs table functions, right?

kaikash commented 1 year ago

Yes, you are correct. We need the implementation of S3 and HDFS table functions in CHYT within Ytsaurus, similar to how it's executed in ClickHouse. Specifically, the feature should allow us to import and export data from and to S3/HDFS, using a syntax akin to this ClickHouse example:

INSERT INTO FUNCTION s3('https://clickhouse-public-datasets.s3.amazonaws.com/my-test-bucket-768/test-data.csv.gz', 'CSV', 'name String, value UInt32', 'gzip')
SELECT name, value FROM existing_table;

ClickHouse docs

kaikash commented 1 year ago

I am not certain if exporting data to HDFS using ClickHouse's HDFS table function is possible.

However, using the HDFS table engine, it can be achieved. Therefore, it might be more preferable to use the HDFS table engine instead of the table functions for exporting data to HDFS.

DimasKovas commented 2 weeks ago

Status update: S3 functions have been supported in CHYT for a few months now. HDFS functions are in the backlog and we do not have plans to support them in the near future.