sodadata / soda-core

:zap: Data quality testing for the modern data stack (SQL, Spark, and Pandas) https://www.soda.io
https://go.soda.io/core-docs
Apache License 2.0
1.81k stars 192 forks source link

Add ability to specify custom hostname for Snowflake connection #2109

Open whummer opened 4 days ago

whummer commented 4 days ago

Add ability to specify custom host/port for Snowflake connections.

Motivation: At LocalStack, we have recently started building a Snowflake emulator that allows running SF queries entirely on the local machine: https://blog.localstack.cloud/2024-05-22-introducing-localstack-for-snowflake/ . As part of building out a repo with sample applications (see here), we are putting together an illustrative Soda application that connects to the local Snowflake emulator (running on localhost) and performs various data quality checks.

We have been able to get it working already by manually patching the soda-core library in the local Pyhon path.

$ cat configuration.yml
 data_source local_snowflake:
   type: snowflake
   connection:
     host: snowflake.localhost.localstack.cloud
     port: 4566
     username: test
     password: test
     ...

... and then:

$ soda scan --verbose -d local_snowflake -c configuration.yml checks.yml
...
[16:25:11] Scan summary:
[16:25:11] 2/2 queries OK
[16:25:11]   local_snowflake.test_table.aggregation[0] [OK] 0:00:00.034070
[16:25:11]   local_snowflake.test_table.schema[test_table] [OK] 0:00:00.040392
[16:25:11] 1/3 checks PASSED:
[16:25:11]     test_table in local_snowflake
[16:25:11]       No NULL values [checks.yml] [PASSED]
[16:25:11]         check_value: 0
[16:25:11] 1/3 checks FAILED:
[16:25:11]     test_table in local_snowflake
[16:25:11]       Dataset is unreasonably small [checks.yml] [FAILED]
[16:25:11]         check_value: 2
[16:25:11] 1/3 checks NOT EVALUATED:
[16:25:11]     test_table in local_snowflake
[16:25:11]       No changes to schema [checks.yml] [NOT EVALUATED]
...

It would be awesome if this integration was provided by Soda-core out of the box! Looking forward to getting your feedback.

Please let me know where is the best place to add a description for these new attributes (happy to add a short note in the docs), or if this change should be covered by a unit test. Thanks 🙌

/cc @m1n0

CLAassistant commented 4 days ago

CLA assistant check
All committers have signed the CLA.

m1n0 commented 3 days ago

hi @whummer, thanks for this, it's an interesting use case! It is a minor change and I don't feel like we need to cover it by test, but I see one issue - I would only pass the host/port if value is present, we do not want to force everyone to start specifying host with this change.

whummer commented 3 days ago

Good point, thanks @m1n0 (although I think the Snowflake Python connector would also be able to handle None values for host/port, i.e., if they are not specified by the user).

Updated the PR to pass in keyword arguments (kwargs) which are only set if values are specified by the user. 👍