numberlabs-developers / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
0 stars 0 forks source link

[SUPPORT] Installation query for Hudi #220

Open torvalds-dev-testbot[bot] opened 2 months ago

torvalds-dev-testbot[bot] commented 2 months ago

Tips before filing an issue

Describe the problem you faced A clear and concise description of the problem.

To Reproduce Steps to reproduce the behavior: 1. 2. 3. 4.

Expected behavior A clear and concise description of what you expected to happen.

Environment Description

Additional context Add any other context about the problem here.

Stacktrace Add the stacktrace of the error.

torvalds-dev-testbot[bot] commented 2 months ago

Hello! 👋 Please wait while I search the knowledge base for an answer; this usually takes about 30 seconds

torvalds-dev-testbot[bot] commented 2 months ago

Relevant Sources:

https://hudi.apache.org/docs/cli

https://medium.com/@ritik20023/quickly-start-using-apache-hudi-on-aws-emr-de432c01e488

https://datacouch.medium.com/getting-started-with-apache-hudi-711b89c107aa

https://apache-hudi.slack.com/archives/C4D716NPQ/p1711199322989749

torvalds-dev-testbot[bot] commented 2 months ago

To install Hudi, you can follow different methods based on your use case and environment:

  1. For local setup after building Hudi, you can use the hudi-cli by navigating to the hudi-cli directory and running ./hudi-cli.sh.

  2. If you are using Spark 3, you can launch the Hudi CLI using the hudi-cli-bundle. Ensure you have spark installed locally with the required dependencies and set the necessary environment variables.

  3. On AWS EMR, Hudi comes pre-installed. You just need to pass the required jars and configurations in the spark-submit command.

  4. To integrate Hudi with Spark, you can start using it with Spark shell by running specific commands depending on your Spark version. Hudi works with Spark 2.4.3+ and Spark 3.x versions.

  5. For AWS Glue, you can enable Hudi functionality by setting specific parameters in your AWS Glue job and configurations.

Sources:

  1. https://hudi.apache.org/docs/cli
  2. https://medium.com/@ritik20023/quickly-start-using-apache-hudi-on-aws-emr-de432c01e488
  3. https://datacouch.medium.com/getting-started-with-apache-hudi-711b89c107aa
  4. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-hudi.html
  5. https://blog.datumagic.com/p/apache-hudi-from-zero-to-one-110 >torvalds.dev is learning and improving. React below to provide feedback! 👍 0 button 👎 0 button
torvalds-dev-testbot[bot] commented 2 months ago

Ticket created successfully. Here is the link to the GitHub issue: https://github.com/numberlabs-developers/hudi/issues/220

torvalds-dev-testbot[bot] commented 2 months ago

Ticket created successfully. Here is the link to the GitHub issue: https://github.com/numberlabs-developers/hudi/issues/220