numberlabs-developers / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
0 stars 0 forks source link

[SUPPORT] Tutorial on registering Hudi files into Hive metastore #234

Open torvalds-dev-testbot[bot] opened 3 months ago

torvalds-dev-testbot[bot] commented 3 months ago

Tips before filing an issue

Describe the problem you faced A clear and concise description of the problem.

To Reproduce Steps to reproduce the behavior: 1. 2. 3. 4.

Expected behavior A clear and concise description of what you expected to happen.

Environment Description

Additional context Add any other context about the problem here.

Stacktrace Add the stacktrace of the error.

torvalds-dev-testbot[bot] commented 3 months ago

Relevant Sources:

https://hudi.apache.org/docs/syncing_metastore

https://medium.com/@devjain1299/apache-hudi-on-aws-glue-a-step-by-step-guide-503c34a9aa95

https://apache-hudi.slack.com/archives/C4D716NPQ/p1714088000801379

torvalds-dev-testbot[bot] commented 3 months ago

Hello! 👋 Please wait while I search the knowledge base for an answer; this usually takes about 30 seconds

torvalds-dev-testbot[bot] commented 3 months ago

Follow up questions:

  1. Could you describe the version of Hudi and other tech stack in your environment? If you are facing an error, please provide a full stack trace and describe what operations led to the errors. Some other artifacts that are useful to triage would be Spark UI screenshots.
  2. Can you please help us understand your environment a little better? It would help to get the versions and Hudi write configs. Also, if you have not already specified, could you describe the nature of your dataset? For example, are there inserts? If so, what is the ratio of the inserts and updates?
  3. Can you please share the Hudi write configs and versions of Hudi and the compute engine you are using? In addition, please provide the query you used to detect duplicates.

Sources:

torvalds-dev-testbot[bot] commented 3 months ago

Hello! 👋 Please wait while I generate the follow up questions; this usually takes about 30 seconds

torvalds-dev-testbot[bot] commented 3 months ago

In order to register Hudi files into the Hive Metastore, you can follow these steps:

By following these steps and referring to the provided documentation snippets, you can effectively sync Hudi files into the Hive Metastore and AWS Glue Data Catalog.

torvalds-dev-testbot[bot] commented 3 months ago

Abhishek posted:

https://devs-numberlabs.slack.com/archives/C06PDCQPA3W/p1722424382585679?thread_ts=1722424506915009&cid=C06PDCQPA3W

torvalds-dev-testbot[bot] commented 3 months ago

Ticket created successfully. Here is the link to the GitHub issue: https://github.com/numberlabs-developers/hudi/issues/234