heirinsinho/all-in-one-docker-bigdataops

all-in-one-docker-bigdataops

all-in-one-docker-bigdataops is a comprehensive Docker Compose environment that simplifies Big Data operations by bundling Hadoop, Spark, Hive, Hue, and Airflow into a ready-to-run stack, with example workflows, quick setup, and easy customization, making it ideal for learning, development, and testing in Big DataOps.

Key Features:

Fully Integrated Big Data Tools: Pre-configured and ready to use, including Hadoop for distributed storage and processing, Spark for big data analytics, Hive for SQL-based querying, Hue for a web-based UI, and Airflow for orchestrating complex data pipelines.
Quick Setup: Get started with a single make start-all command. No more dealing with tedious configurations or dependency issues.
Versatile Usage: Ideal for learning, development, testing, and small-scale production environments.
Example Workflows: Includes sample jobs and scripts to demonstrate the capabilities of the stack, helping you kickstart your big data projects.
Customizable: Easily extend or modify the stack to fit your specific use cases.

Why Use This Repository?

Ease of Use: Simplifies the setup process, so you can focus on developing and testing your big data solutions.
All-in-One Solution: No need to piece together different tools. Everything is bundled in a single, cohesive package.
Community-Driven: Designed to be accessible, well-documented, and easy to discover, with the goal of becoming a go-to resource for Big DataOps on Docker.

How to use?

Ensure you have docker installed + make
You can find useful commands in Makefile
When using airflow, remember to add the following connections:
- conn_id 'spark_docker' --conn-type spark --conn-host spark://spark-master --conn-port 7077
- conn_id 'ssh_hadoop' --conn-type ssh --conn-host namenode --conn-port 22 --conn-login root --conn-password root123
In case you want to deploy the full infrastructure, just:
```
make start-all
```

heirinsinho / all-in-one-docker-bigdataops

readme

all-in-one-docker-bigdataops

Key Features:

Why Use This Repository?

How to use?