torvalds-dev / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
0 stars 0 forks source link

[SUPPORT] Performance degradation in query table #58

Open torvalds-dev[bot] opened 3 weeks ago

torvalds-dev[bot] commented 3 weeks ago

Describe the problem you faced Torvalds.​Dev reported experiencing degradation in query table performance and requests assistance.

Additional context

Environment Description Please provide the following information:

Stacktrace No stacktrace provided.

torvalds-dev[bot] commented 3 weeks ago

Relevant Sources:

https://api.github.com/repos/apache/hudi/issues/10228

https://api.github.com/repos/apache/hudi/issues/10203

https://apache-hudi.slack.com/archives/C4D716NPQ/p1713562820086629

https://apache-hudi.slack.com/archives/C4D716NPQ/p1713166725381159

https://api.github.com/repos/apache/hudi/issues/6297 >torvalds.dev is learning and improving. React below to provide feedback! πŸ‘ 0 button πŸ‘Ž 0 button

torvalds-dev[bot] commented 3 weeks ago

Hello! πŸ‘‹ Please wait while I search the knowledge base for an answer; this usually takes about 30 seconds

torvalds-dev[bot] commented 3 weeks ago

It seems like the performance issue you are facing with query table performance could be related to incremental queries in Apache Hudi. The documentation from http://onehouse.ai|onehouse.ai blog on incremental read patterns mentions that special handling is required for incremental queries since Hudi uses instant time based on the start of the transaction. This could lead to instances where incremental queries do not serve data from in-flight instant or beyond, ensuring no data loss or dropped records. Additionally, the documentation highlights that Hudi focuses on safety over liveness, ensuring data integrity during incremental reads. It explains how Hudi handles scenarios like errors to maintain consistency and safety in data processing. This detailed explanation of incremental read patterns in Apache Hudi could shed light on the performance degradation you are experiencing in query table performance.

Source: https://www.onehouse.ai/blog/on-iceberg-and-hudi-acid-guarantees :large_blue_square: Confidence Score: 90.66% >torvalds.dev is learning and improving. React below to provide feedback! πŸ‘ 0 button πŸ‘Ž 0 button

torvalds-dev[bot] commented 3 weeks ago

:loudspeaker: below documentation from http://onehouse.ai blog on incremental read patterns mentions that special handling is required for incremental queries since Hudi uses instant time based on the start of the transaction. This could lead to instances where incremental queries do not serve data from in-flight instant or beyond, ensuring no data loss or dropped records. Additionally, the documentation highlights that Hudi focuses on safety over liveness, ensuring data integrity during incremental reads. It explains how Hudi handles scenarios like errors to maintain consistency and safety in data processing. This detailed explanation of incremental read patterns in Apache Hudi could shed light on the performance degradation you are experiencing in query table performance.

Source: https://www.onehouse.ai/blog/on-iceberg-and-hudi-acid-guarantees

torvalds-dev[bot] commented 3 weeks ago

:lock: this is a private message

torvalds-dev[bot] commented 3 weeks ago

It seems like the performance issue you are facing with query table performance could be related to incremental queries in Apache Hudi. The documentation from http://onehouse.ai blog on incremental read patterns mentions that special handling is required for incremental queries since