numberlabs-developers / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
0 stars 0 forks source link

[SUPPORT] How to store schema in schema registry for capturing nested arrays and JSONs when onboarding data from Mongo source for Hudi table? #249

Open torvalds-dev-testbot[bot] opened 1 month ago

torvalds-dev-testbot[bot] commented 1 month ago

Tips before filing an issue

Describe the problem you faced

I am looking for suggestions on how to store the schema in the schema registry to capture nested arrays and JSONs with column names for Hudi table onboarding when the source is a Mongo data source, considering a static schema nature of data.

To Reproduce

Expected behavior

Environment Description

Additional context

Stacktrace

torvalds-dev-testbot[bot] commented 1 month ago

Relevant Sources:

https://hudi.apache.org/docs/faq_storage

https://hudi.apache.org/docs/schema_evolution

https://aws.amazon.com/blogs/big-data/automate-schema-evolution-at-scale-with-apache-hudi-in-aws-glue/

https://github.com/apache/hudi/issues/10895

torvalds-dev-testbot[bot] commented 1 month ago

Hello! 👋 Please wait while I search the knowledge base for an answer; this usually takes about 30 seconds

torvalds-dev-testbot[bot] commented 1 month ago

Hudi supports schema reconciliation, allowing you to apply the target table schema on incoming data to handle missing fields by injecting null values. Schema evolution is also supported on write, allowing actions like adding nullable fields or promoting field datatypes. To store complex nested arrays and JSONs with column names for Hudi table onboarding using a static schema nature from a Mongo data source, you can follow the schema evolution best practices and adjust the schema accordingly.

Follow up questions:

Sources:

torvalds-dev-testbot[bot] commented 1 month ago

Please connect your Zendesk account before creating a ticket!

torvalds-dev-testbot[bot] commented 1 month ago

Ticket created successfully. Here is the link to the GitHub issue: https://github.com/numberlabs-developers/hudi/issues/249