Closed mshankla1 closed 3 years ago
Update: This deployment isn't airflow 2.0 compatible.
The quick fix to this is to add a pod_template_file in the persistent volume and reference it under the [kubernetes] option in configmap.yaml (eg. pod_template_file = /opt/airflow/dags/pod_template_file.yaml ). Example pod_template_files are found here: https://airflow.apache.org/docs/apache-airflow/stable/executor/kubernetes.html
Here's an example pod_template_file I used:
---
apiVersion: v1
kind: Pod
metadata:
name: dummy-name
spec:
containers:
- args: []
command: []
env:
- name: AIRFLOW__CORE__EXECUTOR
value: LocalExecutor
# Hard Coded Airflow Envs
- name: AIRFLOW__CORE__FERNET_KEY
valueFrom:
secretKeyRef:
name: fernet-key
key: fernet-key
- name: AIRFLOW__CORE__SQL_ALCHEMY_CONN
valueFrom:
secretKeyRef:
name: airflow-metadata
key: connection
- name: AIRFLOW_CONN_AIRFLOW_DB
valueFrom:
secretKeyRef:
name: airflow-metadata
key: connection
envFrom: []
image: dummy_image
imagePullPolicy: IfNotPresent
name: base
ports: []
volumeMounts:
- name: logs-pv
mountPath: "/opt/airflow/logs"
- name: dags-pv
mountPath: "/opt/airflow/dags"
- name: config
mountPath: "/opt/airflow/airflow.cfg"
subPath: airflow.cfg
readOnly: true
hostNetwork: false
restartPolicy: Never
securityContext:
runAsUser: 50000
fsGroup: 50000
nodeSelector: {}
affinity: {}
tolerations: []
serviceAccountName: "worker-serviceaccount"
volumes:
- name: config
configMap:
name: airflow-config
- name: dags-pv
persistentVolumeClaim:
claimName: dags-pvc
- name: logs-pv
persistentVolumeClaim:
claimName: logs-pvc
Thanks for the great guide!
I'm currently following the guide with an airflow 2.0 image.
I'm running into this issue where my worker nodes crashs with the error
[2021-06-22 22:54:17,516] {cli_action_loggers.py:105} WARNING - Failed to log action with (sqlite3.OperationalError) no such table: log [SQL: INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) VALUES (?, ?, ?, ?, ?, ?, ?)] [parameters: ('2021-06-22 22:54:17.511978', 'airflow_tutorial_v01', 'print_hello', 'cli_task_run', '2021-06-22 22:47:52.924823', 'airflow', '{"host_name": "airflowtutorialv01printhello.34fc0661f3f845f29464738aa150b18f", "full_command": "[\'/home/airflow/.local/bin/airflow\', \'tasks\', \'r ... (28 characters truncated) ... \', \'print_hello\', \'2021-06-22T22:47:52.924823+00:00\', \'--local\', \'--pool\', \'default_pool\', \'--subdir\', \'/opt/airflow/dags/hello.py\']"}')]
The scheduler can connect to the postgres db verified by
$: airflow db check
[2021-06-22 22:56:15,550] {db.py:776} INFO - Connection successful.
and the presence of the requisite tables in postgres db having been written to.I believe my metadata connection secrets is ok:
echo -n "postgresql+psycopg2://airflow%40{hostname}:{pwd}@{hostname}.postgres.database.azure.com:5432/airflow" | base64
Any advice on how to proceed would be wonderful, I'm a bit lost here.
Thanks!