Open andresg3 opened 4 years ago
Hello,
I been searching in google for a couple of hours now but I cant find a workaround this error. I'm trying to use DockerOperator for airflow. DAG:
from airflow.operators.bash_operator import BashOperator from datetime import datetime, timedelta from airflow.operators.docker_operator import DockerOperator import os default_args = { 'owner' : 'airflow', 'description' : 'Use of the DockerOperator', 'depend_on_past' : False, 'start_date' : datetime(2018, 1, 3), 'email_on_failure' : False, 'email_on_retry' : False, 'retries' : 1, 'retry_delay' : timedelta(minutes=5) } with DAG('docker_dag', default_args=default_args, schedule_interval="* 1 * * *", catchup=False) as dag: t1 = BashOperator( task_id='print_current_date', bash_command='date' ) t2 = DockerOperator( task_id='spark_submit', image='jupyter/pyspark-notebook', #image='jupyter/all-spark-notebook', api_version='auto', auto_remove=False, docker_url="unix://var/run/docker.sock", host_tmp_dir='/tmp', tmp_dir='/tmp', volumes=['/usr/local/airflow/scripts:/home/jovyan'], command='spark-submit --master local[*] /home/jovyan/pyspark_test01.py' ) t3 = BashOperator( task_id='print_hello', bash_command='echo "hello world"' ) t1 >> t2 >> t3
Dag Log: (keeps failing with same error every time) dag_log.txt
docker-compose.yml
services: postgres: image: postgres:9.6 environment: - POSTGRES_USER=airflow - POSTGRES_PASSWORD=airflow - POSTGRES_DB=airflow logging: options: max-size: 10m max-file: "3" webserver: #image: puckel/docker-airflow:1.10.9 image: puckel/docker-airflow restart: always depends_on: - postgres environment: - LOAD_EX=n - EXECUTOR=Local logging: options: max-size: 10m max-file: "3" volumes: - ./airflow/dags:/usr/local/airflow/dags - ./airflow/plugins:/usr/local/airflow/plugins - ./airflow/scripts:/usr/local/airflow/scripts - ./requirements.txt:/requirements.txt - '/var/run/docker.sock:/var/run/docker.sock' ports: - "8080:8080" command: webserver healthcheck: test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-webserver.pid ]"] interval: 30s timeout: 30s retries: 3
and finally the script i'm trying to spark-submit:
import pyspark spark = pyspark.sql.SparkSession.builder\ .appName('hogwarts')\ .getOrCreate() characters = [ ("Albus Dumbledore", 150), ("Minerva McGonagall", 70), ("Rubeus Hagrid", 63), ("Oliver Wood", 18), ("Harry Potter", 12), ("Ron Weasley", 12), ("Hermione", 13), ("Draco Malfoy", None) ] c_df = spark.createDataFrame(characters, ["name", "age"]) c_df.show()
Any help would be greatly appreciated. I don't want to give up yet :)
i have the same issue did you solve it?
hey there! Any solution or idea? I am getting the same issue!
Hello,
I been searching in google for a couple of hours now but I cant find a workaround this error. I'm trying to use DockerOperator for airflow. DAG:
Dag Log: (keeps failing with same error every time) dag_log.txt
docker-compose.yml
and finally the script i'm trying to spark-submit:
Any help would be greatly appreciated. I don't want to give up yet :)