great-expectations / great_expectations

Always know what to expect from your data.
https://docs.greatexpectations.io/
Apache License 2.0
9.84k stars 1.52k forks source link

Getting error: RuntimeBatchRequests must specify exactly one corresponding BatchDefinition #3863

Closed pmuntyanu closed 2 years ago

pmuntyanu commented 2 years ago

Describe the bug Hello GE. I am trying to execute test against data in s3 and I am getting this error: ValueError: RuntimeBatchRequests must specify exactly one corresponding BatchDefinition. The reason for that is that in the great_expectations/datasource/data_connector/util.py the batch_request.data_asset_name != batch_definition.data_asset_name returns true on the line 69.

To Reproduce Steps to reproduce the behavior:

  1. upload attached CSV files to the s3 bucket into path pm/ge/ so that you have 2 s3 files: s3://bucket/pm/ge/some_file.csv and s3://bucket/pm/ge/other_file.csv.
  2. execute attached code

Expected behavior the code runs without exception

Environment (please complete the following information):

Additional context @talagluck has a video with reproduce, but there I executed the code using yaml config and cli

Screenshot 2021-12-16 at 11 06 38

s3_test.py.txt requirements.frozen.txt

TraceBack

/Users/pavlo.muntianu/Library/Caches/pypoetry/virtualenvs/de-data-test-suite-tMW2hfNX-py3.9/bin/python /opt/ta/de-data-test-suite/use_cases/s3_test.py
Traceback (most recent call last):
  File "/opt/ta/de-data-test-suite/use_cases/s3_test.py", line 82, in <module>
    main()
  File "/opt/ta/de-data-test-suite/use_cases/s3_test.py", line 74, in main
    validator = context.get_validator(
  File "/Users/pavlo.muntianu/Library/Caches/pypoetry/virtualenvs/de-data-test-suite-tMW2hfNX-py3.9/lib/python3.9/site-packages/great_expectations/data_context/data_context.py", line 1756, in get_validator
    self.get_batch_list(
  File "/Users/pavlo.muntianu/Library/Caches/pypoetry/virtualenvs/de-data-test-suite-tMW2hfNX-py3.9/lib/python3.9/site-packages/great_expectations/core/usage_statistics/usage_statistics.py", line 295, in usage_statistics_wrapped_method
    result = func(*args, **kwargs)
  File "/Users/pavlo.muntianu/Library/Caches/pypoetry/virtualenvs/de-data-test-suite-tMW2hfNX-py3.9/lib/python3.9/site-packages/great_expectations/data_context/data_context.py", line 1675, in get_batch_list
    return datasource.get_batch_list_from_batch_request(batch_request=batch_request)
  File "/Users/pavlo.muntianu/Library/Caches/pypoetry/virtualenvs/de-data-test-suite-tMW2hfNX-py3.9/lib/python3.9/site-packages/great_expectations/datasource/new_datasource.py", line 150, in get_batch_list_from_batch_request
    raise ValueError(
ValueError: RuntimeBatchRequests must specify exactly one corresponding BatchDefinition

Process finished with exit code 1
pmuntyanu commented 2 years ago
Screenshot 2021-12-20 at 11 25 08

I believe we just omit the dataset name at line 111: https://github.com/great-expectations/great_expectations/blame/develop/great_expectations/datasource/data_connector/inferred_asset_file_path_data_connector.py#L111

pmuntyanu commented 2 years ago

When I compose the batch request in the way shown below, all works.


    batch_request = BatchRequest(
        datasource_name="my_s3_datasource",
        data_connector_name="inferred_connector",
        data_asset_name='DEFAULT_ASSET_NAME'
    )
talagluck commented 2 years ago

Thanks for the follow-up, @pmuntyanu! I believe this is likely to be an issue with configuration, but I'm trying to understand where the configuration issue is happening.

I am also surprised that this works with a data_asset_name set to DEFAULT_ASSET_NAME, especially for an InferredAssetDataConnector. Could you please share the BatchRequest that didn't work? What is the name of your asset in this case?

Also, we are unable to download the s3_test.py file above - would you please paste the contents into a gist and link that here? Thank you!

pmuntyanu commented 2 years ago

Hi @talagluck , I found out that I misconfigured it because I did not understand the data_asset_name purpose. Now all is configured and works well(used this documentation). We can close the issue. Thanks a lot for looking into

talagluck commented 2 years ago

Great! Thanks so much for the follow-up.