numberlabs-developers / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
0 stars 0 forks source link

[SUPPORT] Converting existing Parquet files to Hudi for avoiding re-processing and tracking changes #256

Open torvalds-dev-testbot[bot] opened 1 month ago

torvalds-dev-testbot[bot] commented 1 month ago

Tips before filing an issue

Describe the problem you faced

The user is seeking guidance on the best approach to convert existing Parquet files to Hudi without re-processing and wants to understand how Hudi tracks changes when directly copying Parquet files from a normal Hive table to a Hudi path.

To Reproduce

Steps to reproduce the behavior:

1. 2. 3. 4.

Expected behavior

A clear and concise description of what the user expects to happen.

Environment Description

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

torvalds-dev-testbot[bot] commented 1 month ago

Relevant Sources:

https://hudi.apache.org/docs/migration_guide

https://hudi.apache.org/docs/faq_storage

https://hudi.apache.org/blog/2020/08/20/efficient-migration-of-large-parquet-tables/#spark-data-source

https://ranjanankur.medium.com/apache-hudi-copy-on-write-cow-table-77fb2b849733

https://github.com/apache/hudi/issues/10725

torvalds-dev-testbot[bot] commented 1 month ago

Ticket created successfully. Here is the link to the GitHub issue: https://github.com/numberlabs-developers/hudi/issues/256