Open yangli9988 opened 3 years ago
@yangli9988 good suggestion! we didn't implement the row level deletion in the first version. but I think it's something nice to have, and it's also on our roadmap. we will take a look very soon.
Link the pr on iceberg side for row deletion -- https://github.com/apache/iceberg/pull/1309
We also encourage you to implement this feature, feel free to ping me if you need any further support
@beinan 取数据的操作presto使用了自己的presto-parquet方式,而没有使用iceberg-parquet 这之间存在很大的差别,数据行的封装方式也不一样。 iceberg的 FileScanTask 中包含的 List
@beinan uses its own Presto parquet method for fetching data instead of iceberg parquet, which is very different, and the encapsulation method of data rows is also different. The list < deletefile > deletes() information contained in iceberg's filescantask is discarded. The deleted data needs to be filtered after iterating out the data row. This process is not in the connector, and the core module of Presto needs to be modified, which may affect the normal execution of other connectors
I hope you can give me some tips. Complete row data can be read in those specific classes. It is suitable for introducing iceberg data filtering operation. The amount of code changes is less and does not affect other projects
@yangli9988 Good call! I just talked to the iceberg author den and blue this morning. We might prefer to use iceberg's IO classes instead presto ones in the long term. Then it would be much more easier to adopt new feature from iceberg in the future. But this might require the code change on both presto and iceberg said.
So in short term, we're happy to make a patch on the exiting presto code. I will try to go through the current implementation, and I think we can work together if you like.
I am willing to work together to solve this problem
Looks like the iceberg PR is ready to review https://github.com/apache/iceberg/pull/3210
您好! 在通过iceberg api 删除 num =2 的数据后,presto查询结果依然会 显示 num=2的数据行
通过使用 flink 读取这张表,可以发现 num=2的数据被过滤,不再显示
flink 使用了 iceberg-data 过滤了被删除的数据
但在presto 的iceberg connector中没发现相同的过滤操作,也没有引入这jar ,希望您可以修复的这个问题,或者提供修复办法的指导 谢谢
Hello!
After deleting the data of num = 2 through iceberg API, the Presto query result will still display the data row of num = 2
! presto
By reading this table with flick, you can find that the data with num = 2 is filtered and no longer displayed
! flink
Flink uses iceberg data to filter the deleted data
! jar
However, the same filtering operation was not found in the iceberg connector of presto, and the jar was not introduced. I hope you can fix this problem or provide guidance on how to fix it
thank you
@Zhenxiao Luo @Beinan Wang @Chunxu Tang。