great-expectations / great_expectations

Always know what to expect from your data.
https://docs.greatexpectations.io/
Apache License 2.0
9.64k stars 1.5k forks source link

AWS Athena return 0 rows #8503

Open isiuni01 opened 11 months ago

isiuni01 commented 11 months ago

Describe the bug Hi, I am trying to use GX on AWS Glue with AWS athena as a data source, but athena returns 0 rows, if i re-run the same query on athena from his query editor it works well

To Reproduce My config.yml

config_version: 3.0
datasources: {}

config_variables_file_path: great_expectations/uncommitted/config_variables.yml

plugins_directory: great_expectations/plugins/

stores:
  expectations_S3_store:
    class_name: ExpectationsStore
    store_backend:
      class_name: TupleS3StoreBackend
      bucket: 'my_bucket'
      prefix: 'great_expectations/expectations/'

  validations_S3_store:
    class_name: ValidationsStore
    store_backend:
      class_name: TupleS3StoreBackend
      bucket: my_bucket
      prefix: 'great_expectations/uncommitted/validations/'

  evaluation_parameter_store:
    class_name: EvaluationParameterStore

  checkpoint_S3_store:
    class_name: CheckpointStore
    store_backend:
      class_name: TupleS3StoreBackend
      bucket: my_bucket
      prefix: 'great_expectations/checkpoints/'

expectations_store_name: expectations_S3_store
validations_store_name: validations_S3_store
evaluation_parameter_store_name: evaluation_parameter_store
checkpoint_store_name: checkpoint_S3_store

data_docs_sites:
  s3_site:
    class_name: SiteBuilder
    show_how_to_buttons: false
    store_backend:
      class_name: TupleS3StoreBackend
      bucket: gx-output
      prefix: 'site/'
    site_index_builder:
      class_name: DefaultSiteIndexBuilder

And this is my code in my AWS Glue job


`yaml = YAMLHandler()

sc = SparkContext.getOrCreate()
glueContext = GlueContext(sc)
spark = glueContext.spark_session
s3_client = boto3.client("s3")

gx_bucket_name = getResolvedOptions(sys.argv, ['gxBucket'])["gxBucket"]
gxPath = getResolvedOptions(sys.argv, ['gxPath'])["gxPath"]

response = s3_client.get_object(
    Bucket=gx_bucket_name, Key=gxPath
)
config_file = yaml.load(response["Body"])

config = DataContextConfig(
    config_version=config_file["config_version"],
    datasources=config_file["datasources"],
    expectations_store_name=config_file["expectations_store_name"],
    validations_store_name=config_file["validations_store_name"],
    evaluation_parameter_store_name=config_file["evaluation_parameter_store_name"],
    plugins_directory="/great_expectations/plugins",
    validation_operators=None,#config_file["validation_operators"],
    stores=config_file["stores"],
    data_docs_sites=config_file["data_docs_sites"],
    config_variables_file_path=config_file["config_variables_file_path"],
    checkpoint_store_name=config_file["checkpoint_store_name"],
    store_backend_defaults=S3StoreBackendDefaults(
        default_bucket_name=config_file["data_docs_sites"]["s3_site"]["store_backend"][
            "bucket"
        ]
    ),
)
context = gx.get_context(project_config=config)

database = getResolvedOptions(sys.argv, ['database'])["database"]

connection_string = r"awsathena+rest://@athena.eu-central-1.amazonaws.com/" + database + "?s3_staging_dir=dir"

datasource = context.sources.add_sql(
    name="my_datasource", connection_string=connection_string
)

#table = getResolvedOptions(sys.argv, ['table'])["table"]
query = getResolvedOptions(sys.argv, ['query'])["query"]

table_asset = datasource.add_query_asset(
        name="my_asset",
        query= query )

batch_request=table_asset.build_batch_request()

exclude_column_names = []

data_assistant_result = context.assistants.onboarding.run(
    batch_request=batch_request,
    exclude_column_names=exclude_column_names,)

expectation_suite_name = getResolvedOptions(sys.argv, ['expectations'])["expectations"]

expectation_suite = data_assistant_result.get_expectation_suite(
    expectation_suite_name=expectation_suite_name
)

context.add_or_update_expectation_suite(expectation_suite=expectation_suite)` 

Expected behavior i expect athena return some rows, now it works but my expectations suite is empty, i think this is right because it dont have any data

Environment (please complete the following information):

HaebichanGX commented 11 months ago

Hey @isiuni01 unfortunately we haven't fully tested our Athena and Glue for our engineering team to fully fix the issue. GX also doesn't support Windows as well. So our support will be limited, as per our policy. We will put this in the backlog just in case however. Thanks!

isiuni01 commented 11 months ago

hi, thank you so now i can just wait right?