trinodb / trino

Official repository of Trino, the distributed SQL query engine for big data, formerly known as PrestoSQL (https://trino.io)
https://trino.io
Apache License 2.0
10.36k stars 2.98k forks source link

Does iceberg support identity and truncate[W] partition transforms? #8907

Closed EricJoy2048 closed 3 years ago

EricJoy2048 commented 3 years ago

We need ranger partition . It seems that truncate[W] can implement ranger partition. But I do not find this partition transform on the trion document.

hashhar commented 3 years ago

It's not documented yet (some WIP in https://github.com/trinodb/trino/pull/8217/) but yes, Trino does support all the transforms from the Iceberg spec except void.

void will be supported in 361 (was merged as https://github.com/trinodb/trino/pull/8730).

I'm closing this issue since I think it answers your question but feel free to reopen or to join our Slack and continue the conversation there.

EricJoy2048 commented 3 years ago

I found only integer be support in truncate[W]. Why don't we support the long type like iceberg?

hashhar commented 3 years ago

@gaojun2048 long (mapped to Bigint within Trino) is supported for truncate transform too. We do have tests for it too as https://github.com/trinodb/trino/blob/f18a0a5a1b70af230eb742055062b9594dc9ad39/plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java#L1184

Can you share the query you are trying and the error you are running into?

EricJoy2048 commented 3 years ago

@gaojun2048 long (mapped to Bigint within Trino) is supported for truncate transform too. We do have tests for it too as

https://github.com/trinodb/trino/blob/f18a0a5a1b70af230eb742055062b9594dc9ad39/plugin/trino-iceberg/src/test/java/io/trino/plugin/iceberg/BaseIcebergConnectorTest.java#L1184

Can you share the query you are trying and the error you are running into?

Sorry, I mean W does not support long type.

hashhar commented 3 years ago

Because Iceberg requires truncate's width (W) to be an integer. See https://github.com/apache/iceberg/blob/90225d6c9413016d611e2ce5eff37db1bc1b4fc5/api/src/main/java/org/apache/iceberg/transforms/Truncate.java#L39.

This is not mentioned in the spec surprisingly. @rdblue Is this intentional? If so I can send in a PR to update the spec to reflect this.

cc: @losipiuk

EricJoy2048 commented 3 years ago
企业微信截图_b076a92a-08be-4a90-a49c-0da74c3f98d0

The width is int now. In my case , I want to divide a long type field into 3 ranger partitions. So the width must also be long type too.

hashhar commented 3 years ago

I want to divide a long type field into 3 ranger partitions. So the width must also be long type too.

@gaojun2048 I don't think this is true. The data-type of the field has nothing to do with the type of the width field. You can do CREATE TABLE test (c BIGINT, d BIGINT) WITH (partitioning = ARRAY['truncate(c, 3)']) - it's valid and works.

EricJoy2048 commented 3 years ago

I want to divide a long type field into 3 ranger partitions. So the width must also be long type too.

@gaojun2048 I don't think this is true. The data-type of the field has nothing to do with the type of the width field. You can do CREATE TABLE test (c BIGINT, d BIGINT) WITH (partitioning = ARRAY['truncate(c, 3)']) - it's valid and works.

image

From the iceberg document I found W is the width of each partition, not the number of partitions. If I want to divide the numbers from 1 to 100 into 3 ranger partitions, W should be 100/3 + 1 = 34 (because 100% 3 != 0). Do i have a misunderstanding?

Thank you.

hashhar commented 3 years ago

@gaojun2048 If you want to divide a table into a fixed number of partitions I don't think truncate is the correct transform for that. truncate acts more like grouping a range of values into a single partitions. e.g. truncate(d, 10) would create partitions where each partition can hold 10 values. It doesn't limit the overall number of partitions.

If you want to create a fixed number of partitions bucket is the transform you should be looking at. bucket(d, 3) will create 3 partitions regardless of how many possible values of d you have.

EricJoy2048 commented 3 years ago

@gaojun2048 If you want to divide a table into a fixed number of partitions I don't think truncate is the correct transform for that. truncate acts more like grouping a range of values into a single partitions. e.g. truncate(d, 10) would create partitions where each partition can hold 10 values. It doesn't limit the overall number of partitions.

If you want to create a fixed number of partitions bucket is the transform you should be looking at. bucket(d, 3) will create 3 partitions regardless of how many possible values of d you have.

bucket transform Is a kind of hash partition, We need a ranger partition. And If W can support the long type, it can meet the needs of our ranger partition.

hashhar commented 3 years ago

Yes, bucket does a hash-partition and is the only solution for a fixed number of partitions today.

For truncate to support long widths - I think this is something you'd need to bring up with the Iceberg project since the transforms are defined within Iceberg and need to be readable and writable by all of the various writers.

cc: @rdblue

EricJoy2048 commented 3 years ago

I think this is something you'd need to bring up with the Iceberg project since the transforms are defined within Iceberg and need to be readable and writable by all of the various writers.

cc: @rdblue

Yes, Thank you very much. I will go to the iceberg community to raise an issue later.

Thank you!

hashhar commented 3 years ago

@gaojun2048 Once you create an issue there can you please create a new issue in Trino linking to it? Since we'll need to accomodate those changes if they happen.

I'm closing this now since I think we've reached a conclusion about next steps.