trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.3k stars 2.97k forks source link

Add StarRocks Connector #17329

Open chenjian2664 opened 1 year ago

chenjian2664 commented 1 year ago

StarRocks https://www.starrocks.io/

chenjian2664 commented 1 year ago

Re-open it if we think we need it.

ukclivecox commented 1 year ago

Is there a more comprehensive description of why this issue was closed?

hashhar commented 1 year ago

Probably because @chenjian2664 no longer needs it?

chenjian2664 commented 1 year ago

@hashhar Thanks for the attention, let me open it. Hopefully it can get reviewed

jakemongaya commented 1 year ago

@chenjian2664 have you tried using the MySQL connector? Starrocks is mysql compatible. Unable to test this as I don't have starrocks setup https://docs.starrocks.io/en-us/main/integrations/IDE_integrations/DataGrip

chenjian2664 commented 1 year ago

@jakemongaya I tried. StarRocks is a OLAP database system, but it is usual that there will be multi ways to interact with the database, it's worth to add a connector for it.

Starrocks is mysql compatible

I am not sure how much the database can achieve. Even the database can fully make it, but in that way there might loose lots of chances for optimization of StarRocks, some best practice for MySQL may not suitable for the StarRocks.

brunomarram commented 11 months ago

Upvoting, it will be nice. Mysql connector connects but not show tables, only schemas

Wahno commented 10 months ago

Expecting, Is there some progress on this?

YuriyGavrilov commented 10 months ago

+1

hackeryang commented 9 months ago

Upvoting, it will be nice. Mysql connector connects but not show tables, only schemas

Hi @brunomarram , for now you can set below parameters to StarRocks in a mysql client:

set global enable_groupby_use_output_alias=true;
set global enable_profile = true;
set global big_query_profile_threshold = '120s';
set global runtime_profile_report_interval = 60;
set global sql_mode='SORT_NULLS_LAST';
set global sql_dialect = 'trino';

Then you will see all tables stored in the default catalog of StarRocks or Apache Doris. Recommend to config mysql-starrocks.properties like this:

connector.name=mysql
connection-url=jdbc:mysql://starrocks_fe_hostname:9030
connection-user=root
connection-password=
insert.non-transactional-insert.enabled=true
join-pushdown.enabled=true
metadata.cache-ttl=10m
metadata.cache-missing=true
query.comment-format=Trino-$QUERY_ID-$USER-$SOURCE-$TRACE_TOKEN
statistics.enabled=false
mysql.jdbc.use-information-schema=false

But we tested TPCH and found that the speed is much slower than querying StarRocks directly, partially because the aggregation pushdown to StarRocks is weak in the mysql connector. The better way is this PR: https://github.com/trinodb/trino/pull/17330

A-little-bit-of-data commented 5 months ago

Upvoting, it will be nice. Mysql connector connects but not show tables, only schemas

Hi @brunomarram , for now you can set below parameters to StarRocks in a mysql client:

set global enable_groupby_use_output_alias=true;
set global enable_profile = true;
set global big_query_profile_threshold = '120s';
set global runtime_profile_report_interval = 60;
set global sql_mode='SORT_NULLS_LAST';

Then you will see all tables stored in StarRocks or Apache Doris. Recommend to config mysql-starrocks.properties like this:

connector.name=mysql
connection-url=jdbc:mysql://starrocks_fe_hostname:9030
connection-user=root
connection-password=
insert.non-transactional-insert.enabled=true
join-pushdown.enabled=true
metadata.cache-ttl=10m
metadata.cache-missing=true
query.comment-format=Trino-$QUERY_ID-$USER-$SOURCE-$TRACE_TOKEN
statistics.enabled=false
mysql.jdbc.use-information-schema=false

But we tested TPCH and found that the speed is much slower than querying StarRocks directly, partially because the aggregation pushdown to StarRocks is weak in the mysql connector. The better way is this PR: #17330

mysql-starrocks.properties

Hello, thank you very much for providing the configuration to support trino connecting to starrocks. I have configured it according to the above, but I still can’t see the table, only the library. The used trino:430, starrocks:3.1.8, I don’t know about you. What versions can be queried during edge testing? There is another question to ask. There are two catalogs in starrocks, one is default_catalog and the other is hive_catalog. I have given partial database table permissions to the above two catalogs in trino (mysql-starrocks.properties). I only I can see the libraries under default_catalog, but not the rest. Have you ever encountered this situation?

hackeryang commented 5 months ago

Upvoting, it will be nice. Mysql connector connects but not show tables, only schemas

Hi @brunomarram , for now you can set below parameters to StarRocks in a mysql client:

set global enable_groupby_use_output_alias=true;
set global enable_profile = true;
set global big_query_profile_threshold = '120s';
set global runtime_profile_report_interval = 60;
set global sql_mode='SORT_NULLS_LAST';

Then you will see all tables stored in StarRocks or Apache Doris. Recommend to config mysql-starrocks.properties like this:

connector.name=mysql
connection-url=jdbc:mysql://starrocks_fe_hostname:9030
connection-user=root
connection-password=
insert.non-transactional-insert.enabled=true
join-pushdown.enabled=true
metadata.cache-ttl=10m
metadata.cache-missing=true
query.comment-format=Trino-$QUERY_ID-$USER-$SOURCE-$TRACE_TOKEN
statistics.enabled=false
mysql.jdbc.use-information-schema=false

But we tested TPCH and found that the speed is much slower than querying StarRocks directly, partially because the aggregation pushdown to StarRocks is weak in the mysql connector. The better way is this PR: #17330

mysql-starrocks.properties

Hello, thank you very much for providing the configuration to support trino connecting to starrocks. I have configured it according to the above, but I still can’t see the table, only the library. The used trino:430, starrocks:3.1.8, I don’t know about you. What versions can be queried during edge testing? There is another question to ask. There are two catalogs in starrocks, one is default_catalog and the other is hive_catalog. I have given partial database table permissions to the above two catalogs in trino (mysql-starrocks.properties). I only I can see the libraries under default_catalog, but not the rest. Have you ever encountered this situation?

Hi @A-little-bit-of-data , we use Trino 423 and StarRocks 3.2 (without the storage-compute separation mode).

The key parameter to see tables in StarRocks is the set global enable_groupby_use_output_alias=true;, it only can see tables in the default catalog stored inside StarRocks, and cannot see tables in the hive catalog outside StarRocks for now, because i think that the concept of catalog itself means a storage engine, Trino queries other storages instead of other computing engines, so we cannot have such a querying chain: Trino->StarRocks->Hive.

I understand why you want to have such a querying chain, because StarRocks 3.x (with the storage-compute separation mode) can querying Hive 3X faster than Trino for now(with the C++ X86 AVX2 vectorization SIMD), but please be patient, the Project HummingBird is working on: https://github.com/trinodb/trino/issues/14237