numberlabs-developers / hudi

Upserts, Deletes And Incremental Processing on Big Data.
https://hudi.apache.org/
Apache License 2.0
0 stars 0 forks source link

[SUPPORT] Need help in understanding Apache Hudi documentation and CLI usage #219

Open torvalds-dev-testbot[bot] opened 1 month ago

torvalds-dev-testbot[bot] commented 1 month ago

Tips before filing an issue

Describe the problem you faced I need help in understanding the Apache Hudi documentation and CLI usage.

To Reproduce Steps to reproduce the behavior: 1. 2. 3. 4.

Expected behavior A clear and concise description of what you expected to happen.

Environment Description

Additional context Add any other context about the problem here.

Stacktrace Add the stacktrace of the error.

torvalds-dev-testbot[bot] commented 1 month ago

Hello! 👋 Please wait while I search the knowledge base for an answer; this usually takes about 30 seconds

torvalds-dev-testbot[bot] commented 1 month ago

Relevant Sources:

https://hudi.apache.org/docs/cli

https://medium.com/@ritik20023/quickly-start-using-apache-hudi-on-aws-emr-de432c01e488

https://datacouch.medium.com/getting-started-with-apache-hudi-711b89c107aa

https://apache-hudi.slack.com/archives/C4D716NPQ/p1711199322989749

torvalds-dev-testbot[bot] commented 1 month ago

To install Hudi, you have a few options based on your environment:

  1. For local setup:

    • Build Hudi
    • Launch the Hudi CLI with "./hudi-cli.sh"
  2. For EMR:

    • Hudi comes pre-installed on EMR versions like emr-5.30.1
    • Pass necessary jars and configurations in spark-submit command
  3. For Docker environment:

    • Use Maven to build Docker images and bring up Hadoop, Hive, and Spark services with Hudi support
  4. For Spark:

    • Use PySpark or Scala Spark Shell with the appropriate Hudi packages to start using Hudi
  5. For AWS Glue:

    • Enable Hudi by setting --datalake-formats parameter and Spark configurations
    • AWS Glue 4.0 supports Lake Formation permissions for Hudi

Remember to follow the specific setup instructions based on your environment and Hudi version.

Sources:

  1. https://hudi.apache.org/docs/cli
  2. https://medium.com/@ritik20023/quickly-start-using-apache-hudi-on-aws-emr-de432c01e488
  3. https://hudi.apache.org/docs/docker_demo
  4. https://datacouch.medium.com/getting-started-with-apache-hudi-711b89c107aa
  5. https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-etl-format-hudi.html >torvalds.dev is learning and improving. React below to provide feedback! 👍 0 button 👎 0 button
torvalds-dev-testbot[bot] commented 1 month ago

Ticket created successfully. Here is the link to the GitHub issue: https://github.com/Sheth-G/hudi/issues/10

torvalds-dev-testbot[bot] commented 1 month ago

Ticket created successfully. Here is the link to the GitHub issue: https://github.com/Sheth-G/hudi/issues/10

torvalds-dev-testbot[bot] commented 1 month ago

Ticket created successfully. Here is the link to the GitHub issue: https://github.com/numberlabs-developers/hudi/issues/219

torvalds-dev-testbot[bot] commented 1 month ago

Ticket created successfully. Here is the link to the GitHub issue: https://github.com/numberlabs-developers/hudi/issues/219