numberlabs-developers / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
0 stars 0 forks source link

[SUPPORT] Typical data transformation stages from raw data to gold data when using DeltaStreamer in a data pipeline #251

Open torvalds-dev-testbot[bot] opened 3 months ago

torvalds-dev-testbot[bot] commented 3 months ago

Tips before filing an issue

Describe the problem you faced

A clear and concise description of the problem.

To Reproduce

Steps to reproduce the behavior:

1. 2. 3. 4.

Expected behavior

A clear and concise description of what you expected to happen.

Environment Description

Additional context

Add any other context about the problem here.

Stacktrace

Add the stacktrace of the error.

torvalds-dev-testbot[bot] commented 3 months ago

Relevant Sources:

https://www.linkedin.com/pulse/hudi-streamer-delta-hands-on-guide-local-ingestion-from-soumil-shah-jssse/?utm_source=share&utm_medium=member_ios&utm_campaign=share_via

https://bigdataboutique.com/blog/introduction-to-apache-hudi-c83367

https://www.onehouse.ai/blog/how-apache-hudi-simplifies-mpp-data-warehouse-migrations

https://apache-hudi.slack.com/archives/C4D716NPQ/p1711400241290649

torvalds-dev-testbot[bot] commented 3 months ago

Hello! 👋 Please wait while I search the knowledge base for an answer; this usually takes about 30 seconds

torvalds-dev-testbot[bot] commented 3 months ago

In a typical data transformation pipeline using DeltaStreamer, the stages include reading Avro messages from Kafka with Schema Registry, deserializing them using the schemas, and ingesting the data into a Hudi-based data lake. Delta Streamer serves as the component for efficient and continuous data ingestion, ensuring standardized data format across the pipeline.


Follow up questions:

Sources:

torvalds-dev-testbot[bot] commented 3 months ago

Ticket created successfully. Here is the link to the GitHub issue: https://github.com/numberlabs-developers/hudi/issues/251