[SUPPORT] Converting existing Parquet files to Hudi for avoiding re-processing and tracking changes - Githubissues

numberlabs-developers / hudi

Upserts, Deletes And Incremental Processing on Big Data.

https://hudi.apache.org/

Apache License 2.0

0 stars 0 forks source link

[SUPPORT] Converting existing Parquet files to Hudi for avoiding re-processing and tracking changes #256

Open torvalds-dev-testbot[bot] opened 2 months ago

torvalds-dev-testbot[bot] commented 2 months ago

Tips before filing an issue

Have you gone through our FAQs?
Join the mailing list to engage in conversations and get faster support at dev-subscribe@hudi.apache.org.
If you have triaged this as a bug, then file an issue directly.

Describe the problem you faced

The user is seeking guidance on the best approach to convert existing Parquet files to Hudi without re-processing and wants to understand how Hudi tracks changes when directly copying Parquet files from a normal Hive table to a Hudi path.

To Reproduce

Steps to reproduce the behavior:

1. 2. 3. 4.

Expected behavior

A clear and concise description of what the user expects to happen.

Environment Description

Hudi version :
Spark version :
Hive version :
Hadoop version :
Storage (HDFS/S3/GCS..) :
Running on Docker? (yes/no) :

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

torvalds-dev-testbot[bot] commented 2 months ago

Relevant Sources:

https://hudi.apache.org/docs/migration_guide

https://hudi.apache.org/docs/faq_storage

https://hudi.apache.org/blog/2020/08/20/efficient-migration-of-large-parquet-tables/#spark-data-source

https://ranjanankur.medium.com/apache-hudi-copy-on-write-cow-table-77fb2b849733

https://github.com/apache/hudi/issues/10725

torvalds-dev-testbot[bot] commented 2 months ago

Ticket created successfully. Here is the link to the GitHub issue: https://github.com/numberlabs-developers/hudi/issues/256