Open 7yl4r opened 1 year ago
Seeing jobs failing as "Not Yet started" in airflow web GUI with a weird error also when trying to get the task logfile.
reset command:
docker container restart mbon-dashboard-server-airflow-worker-1 mbon-dashboard-server-airflow-webserver-1 mbon-dashboard-server-airflow-scheduler-1 mbon-dashboard-server-flower-1 mbon-dashboard-server-redis-1 mbon-dashboard-server-postgres-1
after doing this they are working again.
This is an ongoing issue. When trying to view a job log in the airflow web GUI:
*** Log file does not exist: /opt/airflow//logs/ts_ingest/ingest_sat_roi_fgb_MODA_chlor_a_SS1/2023-05-20T00:00:00+00:00/1.log
*** Fetching from: http://:8793/log/ts_ingest/ingest_sat_roi_fgb_MODA_chlor_a_SS1/2023-05-20T00:00:00+00:00/1.log
*** Failed to fetch log file from worker. The request to ':///' is missing either an 'http://' or 'https://' protocol.
seeing the same issue on fknms board now
Trying to restart one container at a time to narrow down where the issue might be. After restarting the container I wait ~15min, then clear a DAG and observe the tasks
container name | t waited | status |
---|---|---|
mbon-dashboard-server-airflow-worker-1 | 00:15 | no change |
mbon-dashboard-server-airflow-scheduler-1 | 04:00 | no change |
mbon-dashboard-server-airflow-webserver-1 | 00:10 | no change |
mbon-dashboard-server-redis-1 | 00:15 | working again. |
From docker logs
on the redis container:
* Connecting to MASTER 194.38.20.196:8886
* MASTER <-> REPLICA sync started
# Error condition on socket for SYNC: Connection refused
restarting the fknms board to see if 9c8910b actually fixed it:
tylarmurray@fknms-dashboard-04:~/mbon-dashboard-server$ docker compose down --volumes --rmi all && docker compose up airflow-init && sudo chmod -R 777 airflow/ influxdb/ grafana/ postgres/ && docker compose up airflow-init && docker compose up --build -d
doing the same for fgbnms:
tylarmurray@fgbnms-dashboard-02:~/mbon-dashboard-server$ docker compose down --volumes --rmi all && docker compose up airflow-init && sudo chmod -R 777 airflow/ influxdb/ grafana/ postgres/ && docker compose up airflow-init && docker compose up --build -d
I saw this issue in the docker logs.
I brought down all airflow-related containers (but left grafana and influx up so the existing data isn't affected).
Then brought them back up w/
docker compose up --build -d
.Jobs appear to be completing now. Will check on the data tomorrow.