great-expectations / great_expectations

Always know what to expect from your data.
https://docs.greatexpectations.io/
Apache License 2.0
9.96k stars 1.54k forks source link

URL Masking not working for azure blob storage connection string #10587

Open hemalrajput18 opened 3 days ago

hemalrajput18 commented 3 days ago

Describe the bug I am currently getting constant warnings about not having SQL Alchemy installed when the mask_db_url for the following config_variables.yml entry:

AZURE_STORAGE_CONNECTION_STRING: "DefaultEndpointsProtocol=https;EndpointSuffix=core.windows.net;AccountName=;AccountKey="

I have removed the account name and account key for this example

To Reproduce

Include above entry in config_variables.yml

Here is my great_expectations.yml config:

config_variables_file_path: uncommitted/config_variables.yml
plugins_directory: plugins/
stores:
# Stores are configurable places to store things like Expectations, Validations
# Data Docs, and more. These are for advanced users only - most users can simply
# leave this section alone.
  expectations_store:
    class_name: ExpectationsStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: expectations/

  validation_results_store:
    class_name: ValidationResultsStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/validations/

  checkpoint_store:
    class_name: CheckpointStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      suppress_store_backend_id: true
      base_directory: checkpoints/

  validation_definition_store:
    class_name: ValidationDefinitionStore
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: validation_definitions/

expectations_store_name: expectations_store
validation_results_store_name: validation_results_store
checkpoint_store_name: checkpoint_store

data_docs_sites:
  local_site:
    class_name: SiteBuilder
    show_how_to_buttons: true
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: uncommitted/data_docs/local_site/
    site_index_builder:
      class_name: DefaultSiteIndexBuilder

  azure_docs_site:
    class_name: SiteBuilder
    store_backend:
      class_name: TupleAzureBlobStoreBackend
      container: \$web
      connection_string: ${AZURE_STORAGE_CONNECTION_STRING}
    site_index_builder:
      class_name: DefaultSiteIndexBuilder

  my_data_docs_site:
    class_name: SiteBuilder
    site_index_builder:
      class_name: DefaultSiteIndexBuilder
    store_backend:
      class_name: TupleFilesystemStoreBackend
      base_directory: /dbfs/mnt/web/
fluent_datasources:
  spark_generic_technical_columns_template_suite:
    type: spark
    id: 1b8640f5-1461-4acc-b48f-509b50487332
    assets:
      dataframe_asset_generic_technical_columns_template_suite:
        type: dataframe
        id: 65b363e5-4f4c-4ec1-b989-951e1c3f24ce
        batch_metadata: {}
        batch_definitions:
          generic_technical_columns_template_batch_definition:
            id: 136d1a85-e2d4-4543-b239-9afecec2859b
            partitioner:
  spark_sap_s4_sap_s4_x8_openhub_sap_bw_sap_bw_commitments_management_line_items_suite:
    type: spark
    id: 1a329ac9-da93-43ad-8588-46519e410315
    assets:
      dataframe_asset_sap_s4_sap_s4_x8_openhub_sap_bw_sap_bw_commitments_management_line_items_suite:
        type: dataframe
        id: 64cde889-e875-48ad-ad00-8da951924121
        batch_metadata: {}
        batch_definitions:
          sap_s4_sap_s4_x8_openhub_sap_bw_sap_bw_commitments_management_line_items_batch_definition:
            id: b532cafb-d79a-4652-9b0f-dea16541c701
            partitioner:
analytics_enabled:
data_context_id: 9b7e3388-812e-4325-a3bc-7f48736d52e7

I get this warning message all the time for most operations using the GE Framework. Simplest example would be:

ge_context = ge.get_context(context_root_dir="/dbfs/mnt/config/gx/")

I get the output:

/local_disk0/.ephemeral_nfs/cluster_libraries/python/lib/python3.10/site-packages/great_expectations/data_context/util.py:196: UserWarning: SQLAlchemy is not installed, using urlparse to mask database url password which ignores **kwargs. warnings.warn(

Expected behavior Looking at the source code for the mask_db_url function I see that there is a special case included for the azure blob storage strings _obfuscate_azure_blobstore_connection_string. But the function seems to fail the check if url.startswith("DefaultEndpointsProtocol") for some reason.

Environment (please complete the following information):

adeola-ak commented 8 hours ago

Hi there! Thanks for bringing this to our attention. I've shared it with the team, so please keep an eye out for updates on this issue.