OpenMetadata is a unified metadata platform for data discovery, data observability, and data governance powered by a central metadata repository, in-depth column level lineage, and seamless team collaboration.
Airflow Ingestion pipeline always fails with could not deserialize key data using bot jwt token and authProvider: "openmetadata" in dag_generated_configs directory #16877
Affected module
Ingestion Framework all profiling jobs immediately fail with this error
Describe the bug
When a data quality profile job is created and executed - the airflow dag immediately fails with this error.
To Reproduce
Create a profiler job and execute it - notice that it fails immediately, when debug logging is enabled notice the following stacktrace
`Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/credentials.py", line 65, in validate_private_key
serialization.load_pem_private_key(private_key.encode(), password=None)
File "/home/airflow/.local/lib/python3.10/site-packages/cryptography/hazmat/backends/openssl/backend.py", line 494, in _handle_key_loading_error
raise ValueError(
ValueError: (\'Could not deserialize key data. The data may be in an incorrect format, it may be encrypted with an unsupported algorithm, or it may be an unsupported key type (e.g. EC curves with explicit parameters).\', [<OpenSSLError(code=503841036, lib=60, reason=524556, reason_text=unsupported)>])
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/api/step.py", line 109, in run
result: Either = self._run(record)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/data_quality/processor/test_case_runner.py", line 105, in _run
).get_data_quality_runner()
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/data_quality/runner/base_test_suite_source.py", line 119, in get_data_quality_runner
return DataTestsRunner(self.create_data_quality_interface())
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/data_quality/runner/base_test_suite_source.py", line 103, in create_data_quality_interface
test_suite_interface_factory.create(
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/data_quality/interface/test_suite_interface_factory.py", line 112, in create
return interface(
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/data_quality/interface/sqlalchemy/sqa_test_suite_interface.py", line 61, in init
self.create_session()
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/data_quality/interface/sqlalchemy/sqa_test_suite_interface.py", line 75, in create_session
get_connection(self.service_connection_config)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/connections.py", line 49, in get_connection
return get_connection_fn(connection)(connection)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/bigquery/connection.py", line 86, in get_connection
set_google_credentials(gcp_credentials=connection.credentials)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/credentials.py", line 156, in set_google_credentials
credentials_dict = build_google_credentials_dict(gcp_credentials.gcpConfig)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/credentials.py", line 104, in build_google_credentials_dict
validate_private_key(private_key_str)
File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/credentials.py", line 68, in validate_private_key
raise InvalidPrivateKeyException(msg) from err
metadata.utils.credentials.InvalidPrivateKeyException: Cannot serialise key: (\'Could not deserialize key data. The data may be in an incorrect format, it may be encrypted with an unsupported algorithm, or it may be an unsupported key type (e.g. EC curves with explicit parameters).\', [<OpenSSLError(code=503841036, lib=60, reason=524556, reason_text=unsupported)>])
`
Screenshots or steps to reproduce
Expected behavior
Profiling dags should run without credential failures. Version:
Additional context
The stack trace seems to suggest that set_google_credentials() was called - but actually would have thought that the openmetadata auth provider would be used instead. Likewise not sure why load_pem_private_key would be called at all.
I verified that the JWT token included in the Ingestion dag config json is the same value as the Bot Ingestion-bot user configured in the OpenMetaData settings UI
I also confirmed that it is a valid JWT base 64 encoded value - using https://www.jstoolset.com/jwt to validate. In that view - I do not see any SSL keys included - just the following values which look basically ok. Although there is a "kid" : "Gb389a-9f76-gdjs-a92j-0242bk94356" value in there - but I expect this refers to a the public key on the openmetadata API service container so that a token presented there could be validated.
Affected module Ingestion Framework all profiling jobs immediately fail with this error
Describe the bug When a data quality profile job is created and executed - the airflow dag immediately fails with this error.
To Reproduce Create a profiler job and execute it - notice that it fails immediately, when debug logging is enabled notice the following stacktrace
`Traceback (most recent call last): File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/credentials.py", line 65, in validate_private_key serialization.load_pem_private_key(private_key.encode(), password=None) File "/home/airflow/.local/lib/python3.10/site-packages/cryptography/hazmat/backends/openssl/backend.py", line 494, in _handle_key_loading_error raise ValueError( ValueError: (\'Could not deserialize key data. The data may be in an incorrect format, it may be encrypted with an unsupported algorithm, or it may be an unsupported key type (e.g. EC curves with explicit parameters).\', [<OpenSSLError(code=503841036, lib=60, reason=524556, reason_text=unsupported)>])
The above exception was the direct cause of the following exception:
Traceback (most recent call last): File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/api/step.py", line 109, in run result: Either = self._run(record) File "/home/airflow/.local/lib/python3.10/site-packages/metadata/data_quality/processor/test_case_runner.py", line 105, in _run ).get_data_quality_runner() File "/home/airflow/.local/lib/python3.10/site-packages/metadata/data_quality/runner/base_test_suite_source.py", line 119, in get_data_quality_runner return DataTestsRunner(self.create_data_quality_interface()) File "/home/airflow/.local/lib/python3.10/site-packages/metadata/data_quality/runner/base_test_suite_source.py", line 103, in create_data_quality_interface test_suite_interface_factory.create( File "/home/airflow/.local/lib/python3.10/site-packages/metadata/data_quality/interface/test_suite_interface_factory.py", line 112, in create return interface( File "/home/airflow/.local/lib/python3.10/site-packages/metadata/data_quality/interface/sqlalchemy/sqa_test_suite_interface.py", line 61, in init self.create_session() File "/home/airflow/.local/lib/python3.10/site-packages/metadata/data_quality/interface/sqlalchemy/sqa_test_suite_interface.py", line 75, in create_session get_connection(self.service_connection_config) File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/connections.py", line 49, in get_connection return get_connection_fn(connection)(connection) File "/home/airflow/.local/lib/python3.10/site-packages/metadata/ingestion/source/database/bigquery/connection.py", line 86, in get_connection set_google_credentials(gcp_credentials=connection.credentials) File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/credentials.py", line 156, in set_google_credentials credentials_dict = build_google_credentials_dict(gcp_credentials.gcpConfig) File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/credentials.py", line 104, in build_google_credentials_dict validate_private_key(private_key_str) File "/home/airflow/.local/lib/python3.10/site-packages/metadata/utils/credentials.py", line 68, in validate_private_key raise InvalidPrivateKeyException(msg) from err metadata.utils.credentials.InvalidPrivateKeyException: Cannot serialise key: (\'Could not deserialize key data. The data may be in an incorrect format, it may be encrypted with an unsupported algorithm, or it may be an unsupported key type (e.g. EC curves with explicit parameters).\', [<OpenSSLError(code=503841036, lib=60, reason=524556, reason_text=unsupported)>]) `
Screenshots or steps to reproduce
Expected behavior Profiling dags should run without credential failures.
Version:
OS: [e.g. iOS] Docker for Mac
OpenMetadata version: 1.4.1 Latest release of Docker packaged OpenMetaData containers curl -sL -o docker-compose-postgres.yml https://github.com/open-metadata/OpenMetadata/releases/download/1.4.1-release/docker-compose-postgres.yml
OpenMetadata Ingestion package version: [e.g.
openmetadata-ingestion[docker]==1.4.1
]Additional context The stack trace seems to suggest that set_google_credentials() was called - but actually would have thought that the openmetadata auth provider would be used instead. Likewise not sure why
load_pem_private_key
would be called at all. I verified that the JWT token included in the Ingestion dag config json is the same value as the Bot Ingestion-bot user configured in the OpenMetaData settings UI I also confirmed that it is a valid JWT base 64 encoded value - using https://www.jstoolset.com/jwt to validate. In that view - I do not see any SSL keys included - just the following values which look basically ok. Although there is a "kid" : "Gb389a-9f76-gdjs-a92j-0242bk94356" value in there - but I expect this refers to a the public key on the openmetadata API service container so that a token presented there could be validated.![image](https://github.com/open-metadata/OpenMetadata/assets/80744940/964d54c3-fc20-48e3-a09f-c4c1e19a5194)
The full configuration of the ingestion workflow is here
{ "id": "e1dc34f4-493e-42bd-837a-d38bb6724907", "name": "ec4e4176-182a-4d8d-adcd-56c389c0d63c", "displayName": "dim_address_TestSuite", "description": null, "pipelineType": "TestSuite", "owner": null, "fullyQualifiedName": "sample_data.ecommerce_db.shopify.dim_address.testSuite.ec4e4176-182a-4d8d-adcd-56c389c0d63c", "sourceConfig": { "config": { "type": "TestSuite", "entityFullyQualifiedName": "sample_data.ecommerce_db.shopify.dim_address", "profileSample": null, "profileSampleType": "PERCENTAGE" } }, "openMetadataServerConnection": { "clusterName": "openmetadata", "type": "OpenMetadata", "hostPort": "http://openmetadata-server:8585/api", "authProvider": "openmetadata", "verifySSL": "no-ssl", "sslConfig": null, "securityConfig": { "jwtToken": "eyJraWQiOiJHYjM4OWEtOWY3Ni1nZGpzLWE5MmotMDI0MmJrOTQzNTYiLCJhbGciOiJSUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJvcGVuLW1ldGFkYXRhLm9yZyIsInN1YiI6ImluZ2VzdGlvbi1ib3QiLCJyb2xlcyI6WyJJbmdlc3Rpb25Cb3RSb2xlIl0sImVtYWlsIjoiaW5nZXN0aW9uLWJvdEBvcGVubWV0YWRhdGEub3JnIiwiaXNCb3QiOnRydWUsInRva2VuVHlwZSI6IkJPVCIsImlhdCI6MTcxOTg4ODcxNCwiZXhwIjpudWxsfQ.OzOWoGvJmWc_zBMMXQ1X6zottCATrAEUddsCNHzGiSahThnLdenmyOSDj8lkyoQt6s7uMRXBse6NF5iiS5mb126NgZ0TkiHWb_lo_N-8KtxDFcpq_mjXNLxxkdalgoB8w0pwTzVf2CFMAwg2d-TI3AlJ9-K3jydfei8DNftc8HEV9-DLCTqEz4VW9qm75F9-jGnh09T-72f441jhd25dqnA5fYm0YEN9wFhfbi_Lo6AeHdro-sV788vqOwiXPk6RvF6uGh5Lp5QL0ZUkxoyYuyYKzmanLY1of1Y_AAYkZE2_xyQ31fNkyV0UVU5hztrCwJHxPaQsfF-h4sdVuBF8Wg" }, "secretsManagerProvider": "db", "secretsManagerLoader": "noop", "apiVersion": "v1", "includeTopics": true, "includeTables": true, "includeDashboards": true, "includePipelines": true, "includeMlModels": true, "includeUsers": true, "includeTeams": true, "includeGlossaryTerms": true, "includeTags": true, "includePolicy": true, "includeMessagingServices": true, "enableVersionValidation": true, "includeDatabaseServices": true, "includePipelineServices": true, "limitRecords": 1000, "forceEntityOverwriting": false, "storeServiceConnection": true, "elasticsSearch": null, "supportsDataInsightExtraction": true, "supportsElasticSearchReindexingExtraction": true, "extraHeaders": null }, "airflowConfig": { "pausePipeline": false, "concurrency": 1, "startDate": null, "endDate": null, "pipelineTimezone": "UTC", "retries": 0, "retryDelay": 300, "pipelineCatchup": false, "scheduleInterval": "0 * * * *", "maxActiveRuns": 1, "workflowTimeout": null, "workflowDefaultView": "tree", "workflowDefaultViewOrientation": "LR", "email": null }, "service": { "id": "9803eb48-0458-4715-a439-48ea7f88018f", "type": "testSuite", "name": "sample_data.ecommerce_db.shopify.dim_address.testSuite", "fullyQualifiedName": "sample_data.ecommerce_db.shopify.dim_address.testSuite", "description": "This is an executable test suite linked to an entity", "displayName": "sample_data.ecommerce_db.shopify.dim_address.testSuite", "deleted": false, "inherited": null, "href": null }, "pipelineStatuses": null, "loggerLevel": "DEBUG", "deployed": true, "enabled": true, "href": "http://localhost:8585/api/v1/services/ingestionPipelines/e1dc34f4-493e-42bd-837a-d38bb6724907", "version": 0.2, "updatedAt": 1719889271979, "updatedBy": "admin", "changeDescription": { "fieldsAdded": [], "fieldsUpdated": [ { "name": "loggerLevel", "oldValue": "INFO", "newValue": "DEBUG" }, { "name": "deployed", "oldValue": false, "newValue": true }], "fieldsDeleted": [], "previousVersion": 0.1 }, "deleted": false, "provider": "user" }(airflow)