open-metadata / OpenMetadata

OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
https://open-metadata.org
Apache License 2.0
5.6k stars 1.06k forks source link

Airflow Ingestion pipeline always fails with could not deserialize key data using bot jwt token and authProvider: "openmetadata" in dag_generated_configs directory #16877

Open malcolm-smith-mck opened 4 months ago

malcolm-smith-mck commented 4 months ago

Affected module Ingestion Framework all profiling jobs immediately fail with this error

Describe the bug When a data quality profile job is created and executed - the airflow dag immediately fails with this error.

To Reproduce Create a profiler job and execute it - notice that it fails immediately, when debug logging is enabled notice the following stacktrace

`Traceback (most recent call last): File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/credentials.py", line 65, in validate_private_key serialization.load_pem_private_key(private_key.encode(), password=None) File "/home/airflow/.local/lib/python3.10/site-packages/cryptography/hazmat/backends/openssl/backend.py", line 494, in _handle_key_loading_error raise ValueError( ValueError: (\'Could not deserialize key data. The data may be in an incorrect format, it may be encrypted with an unsupported algorithm, or it may be an unsupported key type (e.g. EC curves with explicit parameters).\', [<OpenSSLError(code=503841036, lib=60, reason=524556, reason_text=unsupported)>])

The above exception was the direct cause of the following exception:

Traceback (most recent call last): File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/api/step.py", line 109, in run result: Either = self._run(record) File "/home/airflow/.local/lib/python3.10/site-packages/metadata/data_quality/processor/test_case_runner.py", line 105, in _run ).get_data_quality_runner() File "/home/airflow/.local/lib/python3.10/site-packages/metadata/data_quality/runner/base_test_suite_source.py", line 119, in get_data_quality_runner return DataTestsRunner(self.create_data_quality_interface()) File "/home/airflow/.local/lib/python3.10/site-packages/metadata/data_quality/runner/base_test_suite_source.py", line 103, in create_data_quality_interface test_suite_interface_factory.create( File "/home/airflow/.local/lib/python3.10/site-packages/metadata/data_quality/interface/test_suite_interface_factory.py", line 112, in create return interface( File "/home/airflow/.local/lib/python3.10/site-packages/metadata/data_quality/interface/sqlalchemy/sqa_test_suite_interface.py", line 61, in init self.create_session() File "/home/airflow/.local/lib/python3.10/site-packages/metadata/data_quality/interface/sqlalchemy/sqa_test_suite_interface.py", line 75, in create_session get_connection(self.service_connection_config) File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/connections.py", line 49, in get_connection return get_connection_fn(connection)(connection) File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/bigquery/connection.py", line 86, in get_connection set_google_credentials(gcp_credentials=connection.credentials) File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/credentials.py", line 156, in set_google_credentials credentials_dict = build_google_credentials_dict(gcp_credentials.gcpConfig) File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/credentials.py", line 104, in build_google_credentials_dict validate_private_key(private_key_str) File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/credentials.py", line 68, in validate_private_key raise InvalidPrivateKeyException(msg) from err metadata.utils.credentials.InvalidPrivateKeyException: Cannot serialise key: (\'Could not deserialize key data. The data may be in an incorrect format, it may be encrypted with an unsupported algorithm, or it may be an unsupported key type (e.g. EC curves with explicit parameters).\', [<OpenSSLError(code=503841036, lib=60, reason=524556, reason_text=unsupported)>]) `

Screenshots or steps to reproduce

Expected behavior Profiling dags should run without credential failures.
Version:

Additional context The stack trace seems to suggest that set_google_credentials() was called - but actually would have thought that the openmetadata auth provider would be used instead. Likewise not sure why load_pem_private_key would be called at all. I verified that the JWT token included in the Ingestion dag config json is the same value as the Bot Ingestion-bot user configured in the OpenMetaData settings UI I also confirmed that it is a valid JWT base 64 encoded value - using https://www.jstoolset.com/jwt to validate. In that view - I do not see any SSL keys included - just the following values which look basically ok. Although there is a "kid" : "Gb389a-9f76-gdjs-a92j-0242bk94356" value in there - but I expect this refers to a the public key on the openmetadata API service container so that a token presented there could be validated.

![image](https://github.com/open-metadata/OpenMetadata/assets/80744940/964d54c3-fc20-48e3-a09f-c4c1e19a5194)

The full configuration of the ingestion workflow is here

{ "id": "e1dc34f4-493e-42bd-837a-d38bb6724907", "name": "ec4e4176-182a-4d8d-adcd-56c389c0d63c", "displayName": "dim_address_TestSuite", "description": null, "pipelineType": "TestSuite", "owner": null, "fullyQualifiedName": "sample_data.ecommerce_db.shopify.dim_address.testSuite.ec4e4176-182a-4d8d-adcd-56c389c0d63c", "sourceConfig": { "config": { "type": "TestSuite", "entityFullyQualifiedName": "sample_data.ecommerce_db.shopify.dim_address", "profileSample": null, "profileSampleType": "PERCENTAGE" } }, "openMetadataServerConnection": { "clusterName": "openmetadata", "type": "OpenMetadata", "hostPort": "http://openmetadata-server:8585/api", "authProvider": "openmetadata", "verifySSL": "no-ssl", "sslConfig": null, "securityConfig": { "jwtToken": "eyJraWQiOiJHYjM4OWEtOWY3Ni1nZGpzLWE5MmotMDI0MmJrOTQzNTYiLCJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJvcGVuLW1ldGFkYXRhLm9yZyIsInN1YiI6ImluZ2VzdGlvbi1ib3QiLCJyb2xlcyI6WyJJbmdlc3Rpb25Cb3RSb2xlIl0sImVtYWlsIjoiaW5nZXN0aW9uLWJvdEBvcGVubWV0YWRhdGEub3JnIiwiaXNCb3QiOnRydWUsInRva2VuVHlwZSI6IkJPVCIsImlhdCI6MTcxOTg4ODcxNCwiZXhwIjpudWxsfQ.OzOWoGvJmWc_zBMMXQ1X6zottCATrAEUddsCNHzGiSahThnLdenmyOSDj8lkyoQt6s7uMRXBse6NF5iiS5mb126NgZ0TkiHWb_lo_N-8KtxDFcpq_mjXNLxxkdalgoB8w0pwTzVf2CFMAwg2d-TI3AlJ9-K3jydfei8DNftc8HEV9-DLCTqEz4VW9qm75F9-jGnh09T-72f441jhd25dqnA5fYm0YEN9wFhfbi_Lo6AeHdro-sV788vqOwiXPk6RvF6uGh5Lp5QL0ZUkxoyYuyYKzmanLY1of1Y_AAYkZE2_xyQ31fNkyV0UVU5hztrCwJHxPaQsfF-h4sdVuBF8Wg" }, "secretsManagerProvider": "db", "secretsManagerLoader": "noop", "apiVersion": "v1", "includeTopics": true, "includeTables": true, "includeDashboards": true, "includePipelines": true, "includeMlModels": true, "includeUsers": true, "includeTeams": true, "includeGlossaryTerms": true, "includeTags": true, "includePolicy": true, "includeMessagingServices": true, "enableVersionValidation": true, "includeDatabaseServices": true, "includePipelineServices": true, "limitRecords": 1000, "forceEntityOverwriting": false, "storeServiceConnection": true, "elasticsSearch": null, "supportsDataInsightExtraction": true, "supportsElasticSearchReindexingExtraction": true, "extraHeaders": null }, "airflowConfig": { "pausePipeline": false, "concurrency": 1, "startDate": null, "endDate": null, "pipelineTimezone": "UTC", "retries": 0, "retryDelay": 300, "pipelineCatchup": false, "scheduleInterval": "0 * * * *", "maxActiveRuns": 1, "workflowTimeout": null, "workflowDefaultView": "tree", "workflowDefaultViewOrientation": "LR", "email": null }, "service": { "id": "9803eb48-0458-4715-a439-48ea7f88018f", "type": "testSuite", "name": "sample_data.ecommerce_db.shopify.dim_address.testSuite", "fullyQualifiedName": "sample_data.ecommerce_db.shopify.dim_address.testSuite", "description": "This is an executable test suite linked to an entity", "displayName": "sample_data.ecommerce_db.shopify.dim_address.testSuite", "deleted": false, "inherited": null, "href": null }, "pipelineStatuses": null, "loggerLevel": "DEBUG", "deployed": true, "enabled": true, "href": "http://localhost:8585/api/v1/services/ingestionPipelines/e1dc34f4-493e-42bd-837a-d38bb6724907", "version": 0.2, "updatedAt": 1719889271979, "updatedBy": "admin", "changeDescription": { "fieldsAdded": [], "fieldsUpdated": [ { "name": "loggerLevel", "oldValue": "INFO", "newValue": "DEBUG" }, { "name": "deployed", "oldValue": false, "newValue": true }], "fieldsDeleted": [], "previousVersion": 0.1 }, "deleted": false, "provider": "user" }(airflow)

harshach commented 4 months ago

@malcolm-smith-mck can you please ask this question in our slack before opening an issue. https://slack.open-metadata.org