puckel / docker-airflow

Docker Apache Airflow
Apache License 2.0
3.76k stars 533 forks source link

ERROR - docker container failed: 'Error': None, 'StatusCode': 1 #581

Open andresg3 opened 4 years ago

andresg3 commented 4 years ago

Hello,

I been searching in google for a couple of hours now but I cant find a workaround this error. I'm trying to use DockerOperator for airflow. DAG:

from airflow.operators.bash_operator import BashOperator
from datetime import datetime, timedelta
from airflow.operators.docker_operator import DockerOperator
import os

default_args = {
        'owner'                 : 'airflow',
        'description'           : 'Use of the DockerOperator',
        'depend_on_past'        : False,
        'start_date'            : datetime(2018, 1, 3),
        'email_on_failure'      : False,
        'email_on_retry'        : False,
        'retries'               : 1,
        'retry_delay'           : timedelta(minutes=5)
}

with DAG('docker_dag', default_args=default_args, schedule_interval="* 1 * * *", catchup=False) as dag:
        t1 = BashOperator(
                task_id='print_current_date',
                bash_command='date'
        )

        t2 = DockerOperator(
                task_id='spark_submit',
                image='jupyter/pyspark-notebook',
                #image='jupyter/all-spark-notebook',
                api_version='auto',
                auto_remove=False,
                docker_url="unix://var/run/docker.sock",
                host_tmp_dir='/tmp', 
                tmp_dir='/tmp',
                volumes=['/usr/local/airflow/scripts:/home/jovyan'],
                command='spark-submit --master local[*] /home/jovyan/pyspark_test01.py'
        )

        t3 = BashOperator(
                task_id='print_hello',
                bash_command='echo "hello world"'
        )

        t1 >> t2 >> t3

Dag Log: (keeps failing with same error every time) dag_log.txt

docker-compose.yml

services:

    postgres:
        image: postgres:9.6
        environment:
            - POSTGRES_USER=airflow
            - POSTGRES_PASSWORD=airflow
            - POSTGRES_DB=airflow
        logging:
            options:
                max-size: 10m
                max-file: "3"

    webserver:
        #image: puckel/docker-airflow:1.10.9
        image: puckel/docker-airflow
        restart: always
        depends_on:
            - postgres
        environment:
            - LOAD_EX=n
            - EXECUTOR=Local
        logging:
            options:
                max-size: 10m
                max-file: "3"
        volumes:
            - ./airflow/dags:/usr/local/airflow/dags
            - ./airflow/plugins:/usr/local/airflow/plugins
            - ./airflow/scripts:/usr/local/airflow/scripts
            - ./requirements.txt:/requirements.txt
            - '/var/run/docker.sock:/var/run/docker.sock'
        ports:
            - "8080:8080"
        command: webserver
        healthcheck:
            test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"]
            interval: 30s
            timeout: 30s
            retries: 3

and finally the script i'm trying to spark-submit:

import pyspark

spark = pyspark.sql.SparkSession.builder\
    .appName('hogwarts')\
    .getOrCreate()

characters = [
    ("Albus Dumbledore", 150),
    ("Minerva McGonagall", 70),
    ("Rubeus Hagrid", 63),
    ("Oliver Wood", 18),
    ("Harry Potter", 12),
    ("Ron Weasley", 12),
    ("Hermione", 13),
    ("Draco Malfoy", None)
]

c_df = spark.createDataFrame(characters, ["name", "age"])    

c_df.show()

Any help would be greatly appreciated. I don't want to give up yet :)

hdm30 commented 2 years ago

i have the same issue did you solve it?

luism256WSM commented 2 years ago

hey there! Any solution or idea? I am getting the same issue!