hikaya-io / connectors

A flexible data integration tool to help nonprofits connect to their data collection tools and ERP systems
0 stars 0 forks source link

LWF's SurveyCTO DAG fails #58

Open amosnjoroge opened 2 years ago

amosnjoroge commented 2 years ago

Current behavior the SurveyCTO Dag fails with the below error:

*** Log file does not exist: /opt/airflow/logs/dots_survey_cto_data_pipeline/Save_data_to_DB/2022-01-23T00:00:00+00:00/2.log
*** Fetching from: http://airflow-worker-0.airflow-worker.default.svc.cluster.local:8793/log/dots_survey_cto_data_pipeline/Save_data_to_DB/2022-01-23T00:00:00+00:00/2.log

[2022-01-24 00:05:06,084] {taskinstance.py:896} INFO - Dependencies all met for <TaskInstance: dots_survey_cto_data_pipeline.Save_data_to_DB 2022-01-23T00:00:00+00:00 [queued]>
[2022-01-24 00:05:06,107] {taskinstance.py:896} INFO - Dependencies all met for <TaskInstance: dots_survey_cto_data_pipeline.Save_data_to_DB 2022-01-23T00:00:00+00:00 [queued]>
[2022-01-24 00:05:06,108] {taskinstance.py:1087} INFO - 
--------------------------------------------------------------------------------
[2022-01-24 00:05:06,108] {taskinstance.py:1088} INFO - Starting attempt 2 of 3
[2022-01-24 00:05:06,108] {taskinstance.py:1089} INFO - 
--------------------------------------------------------------------------------
[2022-01-24 00:05:06,158] {taskinstance.py:1107} INFO - Executing <Task(PythonOperator): Save_data_to_DB> on 2022-01-23T00:00:00+00:00
[2022-01-24 00:05:06,166] {standard_task_runner.py:52} INFO - Started process 1230 to run task
[2022-01-24 00:05:06,172] {standard_task_runner.py:76} INFO - Running: ['airflow', 'tasks', 'run', 'dots_survey_cto_data_pipeline', 'Save_data_to_DB', '2022-01-23T00:00:00+00:00', '--job-id', '160', '--pool', 'default_pool', '--raw', '--subdir', 'DAGS_FOLDER/pull_survey_cto_data.py', '--cfg-path', '/tmp/tmpbl7aom4g', '--error-file', '/tmp/tmpb0vzypdc']
[2022-01-24 00:05:06,174] {standard_task_runner.py:77} INFO - Job 160: Subtask Save_data_to_DB
[2022-01-24 00:05:06,468] {logging_mixin.py:104} INFO - Running <TaskInstance: dots_survey_cto_data_pipeline.Save_data_to_DB 2022-01-23T00:00:00+00:00 [running]> on host airflow-worker-0.airflow-worker.default.svc.cluster.local
[2022-01-24 00:05:06,593] {taskinstance.py:1300} INFO - Exporting the following env vars:
AIRFLOW_CTX_DAG_OWNER=Hikaya-Dots
AIRFLOW_CTX_DAG_ID=dots_survey_cto_data_pipeline
AIRFLOW_CTX_TASK_ID=Save_data_to_DB
AIRFLOW_CTX_EXECUTION_DATE=2022-01-23T00:00:00+00:00
AIRFLOW_CTX_DAG_RUN_ID=scheduled__2022-01-23T00:00:00+00:00
[2022-01-24 00:05:06,594] {pull_survey_cto_data.py:44} INFO - Loading data from SurveyCTO server lutheranworld...
[2022-01-24 00:05:06,847] {surveycto.py:69} ERROR - Error getting list of SurveyCTO forms
[2022-01-24 00:05:06,848] {surveycto.py:70} ERROR - 401 Client Error: 401 for url: https://lutheranworld.surveycto.com/console/forms-groups-datasets/get
[2022-01-24 00:05:06,848] {taskinstance.py:1501} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1157, in _run_raw_task
    self._prepare_and_execute_task_with_callbacks(context, task)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1331, in _prepare_and_execute_task_with_callbacks
    result = self._execute_task(context, task_copy)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 1361, in _execute_task
    result = task_copy.execute(context=context)
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 150, in execute
    return_value = self.execute_callable()
  File "/home/airflow/.local/lib/python3.8/site-packages/airflow/operators/python.py", line 161, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/opt/airflow/dags/repo/connectors/DAGs/pull_survey_cto_data.py", line 54, in import_forms_and_submissions
    forms = scto_client.get_all_forms()
  File "/opt/airflow/dags/repo/connectors/DAGs/helpers/surveycto.py", line 71, in get_all_forms
    raise e
  File "/opt/airflow/dags/repo/connectors/DAGs/helpers/surveycto.py", line 61, in get_all_forms
    forms_request = self.session.get(
  File "/home/airflow/.local/lib/python3.8/site-packages/requests/sessions.py", line 555, in get
    return self.request('GET', url, **kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/requests/sessions.py", line 662, in send
    r = dispatch_hook('response', hooks, r, **kwargs)
  File "/home/airflow/.local/lib/python3.8/site-packages/requests/hooks.py", line 31, in dispatch_hook
    _hook_data = hook(hook_data, **kwargs)
  File "/opt/airflow/dags/repo/connectors/DAGs/helpers/requests.py", line 58, in <lambda>
    assert_status_hook = lambda response, *args, **kwargs: response.raise_for_status()
  File "/home/airflow/.local/lib/python3.8/site-packages/requests/models.py", line 943, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 401 Client Error: 401 for url: https://lutheranworld.surveycto.com/console/forms-groups-datasets/get
[2022-01-24 00:05:06,852] {taskinstance.py:1544} INFO - Marking task as UP_FOR_RETRY. dag_id=dots_survey_cto_data_pipeline, task_id=Save_data_to_DB, execution_date=20220123T000000, start_date=20220124T000506, end_date=20220124T000506
[2022-01-24 00:05:06,907] {local_task_job.py:149} INFO - Task exited with return code 1
andrewtpham commented 2 years ago

@amosnjoroge @TAnas0 do you think this has anything to do with our SurveyCTO credentials being revoked from the LWF surveyCTO account?

amosnjoroge commented 2 years ago

Hi @andrewtpham most likely that could be the issue. The failing get request requires authentication.