greatexpectationslabs / ge_tutorials

Learn how to add data validation and documentation to a data pipeline built with dbt and Airflow.
167 stars 84 forks source link

validation without CLI example #12

Closed jdimatteo closed 3 years ago

jdimatteo commented 3 years ago

provide example without using CLI based on https://docs.greatexpectations.io/en/latest/guides/workflows_patterns/deployment_hosted_environments.html

jdimatteo commented 3 years ago

When I run example.py added in this PR, the queries associated with the queries are run twice, as shown with SQLAlchemy info level logging below. How can I prevent queries from running twice?

python example.py
...
┌──────────────────────────┐
│Step 2.2: Add Expectations│
└──────────────────────────┘
INFO:sqlalchemy.engine.base.Engine:SELECT count(*) AS element_count, sum(CASE WHEN (%(param_1)s = %(param_2)s) THEN %(param_3)s ELSE %(param_4)s END) AS null_count, sum(CASE WHEN (fare_amount IS NULL AND NOT (%(param_1)s = %(param_2)s)) THEN %(param_5)s ELSE %(param_6)s END) AS unexpected_count 
FROM public.yellow_tripdata_sample_2019_01
INFO:sqlalchemy.engine.base.Engine:{'param_1': False, 'param_2': True, 'param_3': 1, 'param_4': 0, 'param_5': 1, 'param_6': 0}
INFO:sqlalchemy.engine.base.Engine:SELECT fare_amount 
FROM public.yellow_tripdata_sample_2019_01 
WHERE fare_amount IS NULL AND NOT (%(param_1)s = %(param_2)s) 
 LIMIT %(param_3)s
INFO:sqlalchemy.engine.base.Engine:{'param_1': False, 'param_2': True, 'param_3': 20}
┌──────────────────────┐
│Step 3: Run validation│
└──────────────────────┘
INFO:sqlalchemy.engine.base.Engine:SELECT count(*) AS element_count, sum(CASE WHEN (%(param_1)s = %(param_2)s) THEN %(param_3)s ELSE %(param_4)s END) AS null_count, sum(CASE WHEN (fare_amount IS NULL AND NOT (%(param_1)s = %(param_2)s)) THEN %(param_5)s ELSE %(param_6)s END) AS unexpected_count 
FROM public.yellow_tripdata_sample_2019_01
INFO:sqlalchemy.engine.base.Engine:{'param_1': False, 'param_2': True, 'param_3': 1, 'param_4': 0, 'param_5': 1, 'param_6': 0}
INFO:sqlalchemy.engine.base.Engine:SELECT fare_amount 
FROM public.yellow_tripdata_sample_2019_01 
WHERE fare_amount IS NULL AND NOT (%(param_1)s = %(param_2)s) 
 LIMIT %(param_3)s
INFO:sqlalchemy.engine.base.Engine:{'param_1': False, 'param_2': True, 'param_3': 20}
spbail commented 3 years ago

Hi @jdimatteo! Apologies for the late response, we didn't have notifications turned on for this repository! I'll get to this soon, thanks for the submission!

jdimatteo commented 3 years ago

This example is no longer useful because it is so outdated that it references postgres example data which is no longer documented