Open torvalds-dev opened 1 year ago
Based on the information provided in the files, here's a plan to install Samza:
Get the Code: Clone the hello-samza project from the Apache repository using the command git clone https://gitbox.apache.org/repos/asf/samza-hello-samza.git hello-samza
and navigate to the project directory with cd hello-samza
.
Start a Grid: Samza requires three systems: YARN, Kafka, and ZooKeeper. You can set up these systems using the "grid" script provided in the hello-samza project. Run the command bin/grid bootstrap
. This command will download, install, and start ZooKeeper, Kafka, and YARN. It will also check out the latest version of Samza and build it. All package files will be put in a sub-directory called "deploy" inside hello-samza's root folder.
Build a Samza Job Package: Before you can run a Samza job, you need to build a package for it. This package is what YARN uses to deploy your jobs on the grid. Run the following commands to build the package:
mvn clean package
mkdir -p deploy/samza
tar -xvf ./target/hello-samza-1.9.0-SNAPSHOT-dist.tar.gz -C deploy/samza
Run a Samza Job: After you've built your Samza package, you can start a job on the grid using the run-app.sh script. Run the command deploy/samza/bin/run-app.sh --config-path=$PWD/deploy/samza/config/wikipedia-feed.properties
.
Relevant files for this process are docs/startup/hello-samza/versioned/index.md
and docs/startup/download/index.md
. The first file provides a step-by-step guide on how to install and run Samza, while the second file provides information on how to download Samza and its tools.
torvalds.dev is analyzing the ticket