great-expectations / great_expectations

Always know what to expect from your data.
https://docs.greatexpectations.io/
Apache License 2.0
9.71k stars 1.5k forks source link

`row_condition` with SQLAlchemy not working as documented #8848

Open nenkie76 opened 9 months ago

nenkie76 commented 9 months ago

Describe the bug The documentation states that row conditions for SQL should be specified like this: row_condition='col("foo") != "a-b"'

So, in Jupiter Notebooks I run the following expectation:

validator.expect_column_values_to_not_be_null(
    column='network_id', 
    condition_parser='great_expectations__experimental__', 
    row_condition='col("source_id") != 1'
)

but it fails with the error: unable to parse condition. I have also tried these ways, but neither work:

unable to parse condition: col("source_id").not_in([1])
unable to parse condition: col("source_id") != 1
unable to parse condition: col("source_id") <> 1
unable to parse condition: ~col("source_id") == 1

Something similar was recently raised by @matthiasgomolka

Environment (please complete the following information):

HaebichanGX commented 9 months ago

Hi @nenkie76 thank you for letting us know, this was expressed in the previous issue https://github.com/great-expectations/great_expectations/issues/8847. There is a solution that he provides in that issue. Please take a look. We'll put this in backlog in the meantime

nenkie76 commented 9 months ago

@HaebichanGX , parallel thread is about Spark and the way of writing an expression, but this one is only about != operator which might no be supported. As I understand it comes from _parse_great_expectations_condition() here, but I had no time yet to debug the root cause.

kujaska commented 3 weeks ago

PLEEEEASE get rid of this row_condition='col("foo").notNull()'

and allow simple SQL syntax passthru: row_condition = 'fld1=5 OR fld2<>7 AND fld3 <9'