sassoftware / sas-airflow-provider

Apache Airflow Provider for creating tasks in Airflow to execute SAS Studio Flows and Jobs.
Apache License 2.0
18 stars 15 forks source link

session.verify got overwritten by env var #35

Open rizhansas opened 3 months ago

rizhansas commented 3 months ago

Problem:

In our Airflow worker pod, we specify an env var REQUESTS_CA_BUNDLE. This leads to SAS Studio Flow operator failed to honor the extra field of Airflow Connection {"ssl_certificate_verification": false } to skip the cert verification.

As you can see, it confirmed TLS verification is turned off and even get the access token from SAS Logon Get oauth token. But it failed to talk to SAS Studio REST endpoint.

[2024-06-26, 17:08:58 UTC] {sas.py:52} INFO - TLS verification is turned off
[2024-06-26, 17:08:58 UTC] {sas.py:62} INFO - Creating session for connection named sas_default to host https://d21670.ingress-nginx.miadmin-01-m1.irm.sashq-d.openstack.sas.com/
[2024-06-26, 17:08:58 UTC] {sas.py:82} INFO - Get oauth token (see README if this crashes)
[2024-06-26, 17:08:59 UTC] {sas_studioflow.py:90} INFO - Generate code for Studio Flow: /Users/miadmin/TestFlow.flw
[2024-06-26, 17:08:59 UTC] {logging_mixin.py:188} INFO - Code Generation for Studio Flow without Compute session
[2024-06-26, 17:08:59 UTC] {taskinstance.py:441} ▼ Post task execution logs
[2024-06-26, 17:08:59 UTC] {taskinstance.py:2905} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/home/sas/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 715, in urlopen
    httplib_response = self._make_request(
  File "/home/sas/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 404, in _make_request
    self._validate_conn(conn)
  File "/home/sas/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1060, in _validate_conn
    conn.connect()
  File "/home/sas/.local/lib/python3.8/site-packages/urllib3/connection.py", line 419, in connect
    self.sock = ssl_wrap_socket(
  File "/home/sas/.local/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/home/sas/.local/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/lib64/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/lib64/python3.8/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/usr/lib64/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
ssl.SSLCertVerificationError: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1131)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/sas/.local/lib/python3.8/site-packages/requests/adapters.py", line 564, in send
    resp = conn.urlopen(
  File "/home/sas/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 801, in urlopen
    retries = retries.increment(
  File "/home/sas/.local/lib/python3.8/site-packages/urllib3/util/retry.py", line 594, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='d21670.ingress-nginx.miadmin-01-m1.irm.sashq-d.openstack.sas.com', port=443): Max retries exceeded with url: /studioDevelopment/code (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1131)')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/sas/.local/lib/python3.8/site-packages/sas_airflow_provider/operators/sas_studioflow.py", line 91, in execute
    code = _generate_flow_code(
  File "/home/sas/.local/lib/python3.8/site-packages/sas_airflow_provider/operators/sas_studioflow.py", line 199, in _generate_flow_code
    response = session.post(uri, json=req)
  File "/home/sas/.local/lib/python3.8/site-packages/sas_airflow_provider/hooks/sas.py", line 112, in <lambda>
    session.post = lambda *args, **kwargs: requests.Session.post(  # type: ignore
  File "/home/sas/.local/lib/python3.8/site-packages/requests/sessions.py", line 637, in post
    return self.request("POST", url, data=data, json=json, **kwargs)
  File "/home/sas/.local/lib/python3.8/site-packages/requests/sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/sas/.local/lib/python3.8/site-packages/requests/sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "/home/sas/.local/lib/python3.8/site-packages/requests/adapters.py", line 595, in send
    raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='d21670.ingress-nginx.miadmin-01-m1.irm.sashq-d.openstack.sas.com', port=443): Max retries exceeded with url: /studioDevelopment/code (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1131)')))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/sas/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 465, in _execute_task
    result = _execute_callable(context=context, **execute_callable_kwargs)
  File "/home/sas/.local/lib/python3.8/site-packages/airflow/models/taskinstance.py", line 432, in _execute_callable
    return execute_callable(context=context, **execute_callable_kwargs)
  File "/home/sas/.local/lib/python3.8/site-packages/airflow/models/baseoperator.py", line 400, in wrapper
    return func(self, *args, **kwargs)
  File "/home/sas/.local/lib/python3.8/site-packages/sas_airflow_provider/operators/sas_studioflow.py", line 124, in execute
    raise AirflowException(f"SASStudioFlowOperator error: {str(e)}")
airflow.exceptions.AirflowException: SASStudioFlowOperator error: HTTPSConnectionPool(host='d21670.ingress-nginx.miadmin-01-m1.irm.sashq-d.openstack.sas.com', port=443): Max retries exceeded with url: /studioDevelopment/code (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1131)')))
[2024-06-26, 17:08:59 UTC] {taskinstance.py:1206} INFO - Marking task as FAILED. dag_id=MySASStudioFlowOperatorDAG, task_id=sas_studio_test_flow, run_id=manual__2024-06-26T17:08:55.695486+00:00, execution_date=20240626T170855, start_date=20240626T170858, end_date=20240626T170859
[2024-06-26, 17:08:59 UTC] {standard_task_runner.py:110} ERROR - Failed to execute job 6 for task sas_studio_test_flow (SASStudioFlowOperator error: HTTPSConnectionPool(host='d21670.ingress-nginx.miadmin-01-m1.irm.sashq-d.openstack.sas.com', port=443): Max retries exceeded with url: /studioDevelopment/code (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: self signed certificate in certificate chain (_ssl.c:1131)'))); 14161)
[2024-06-26, 17:08:59 UTC] {local_task_job_runner.py:240} INFO - Task exited with return code 1
[2024-06-26, 17:08:59 UTC] {taskinstance.py:3498} INFO - 0 downstream tasks scheduled from follow-on schedule check
[2024-06-26, 17:08:59 UTC] {local_task_job_runner.py:222} ▲▲▲ Log group end

Root Cause

In the 1st REST call, it explicitly passed the boolean value verify to the request.post function. It works as expected.

https://github.com/sassoftware/sas-airflow-provider/blob/b5527629e4592b0ba85abf6b5f77da2f058b4d06/src/sas_airflow_provider/hooks/sas.py#L83-L89

In the 2nd REST call, it didn't pass verify to the request.* function but rather Session.verify.

https://github.com/sassoftware/sas-airflow-provider/blob/b5527629e4592b0ba85abf6b5f77da2f058b4d06/src/sas_airflow_provider/hooks/sas.py#L103-L121

There is a bug in Python request function for years. But still everyone is wasting hours for this overwritten issue. It is better that we fix it in our code or at least make two REST calls behave in a consistent way (either both fail or both succeed).