soda_scan_execute improvements: Accepting Environment Variables, Dict-based scan results

fakhavan commented 1 year ago

Updating the soda_scan_execute task with options to pass environmental variables to the shell task running the Soda command and to save and return the dictionary-based scan results.

Summary

As discussed at https://github.com/sodadata/prefect-soda-core/issues/16, this PR :

Gives the user the option to receive more detailed scan results of the soda scan tasks.
The same option, would result in the same scan results being saved on the disk. The user can (but does not have to) pass a custom path to store the JSON file.
Gives the user the option to pass environment variables to be passed to the shell task. This could be helpful to pass database credentials to the task. In my flow, I retrieve a standard snowflake-credential block, build a dictionary of database credentials and pass it to this task.
Handles the RuntimeError exception that is raised when the check runs successfully but does not pass. It is not the cleanest way to handle this but in my tests, this was the only way to handle this exception when the soda_scan_execute using map. However, I'm open to suggestions if you have a better way of handling this issue.

Relevant Issue(s)

https://github.com/sodadata/prefect-soda-core/issues/16

Checklist

[x] Summarized PR's changes in CHANGELOG.md

AlessandroLollo commented 1 year ago

Hey @fakhavan 👋 Would it make sense to return the content of the scan result file as a JSON/Dict for the soda_scan_execute task?

@vijaykiran would love to know your opinion too 🙂

AlessandroLollo commented 1 year ago

@fakhavan with regard to the Static analysis failure, you should be able to fix it by changing isort version to 5.12.0 in .pre-commit-config.yaml

fakhavan commented 1 year ago

Hey @AlessandroLollo I pushed some more changes including the option to retrieve the dictionary-based scan results. I have updated the PR description if you want some visibility on what has changed. Most of these changes are the result of making the task work in my flow.

Drilling down on the exception issue, when the task is used in mapped mode, one task (scan's) failure would cause the whole flow (including other scans) to fail. I tried return_state=True as it's the main way of handling failures but seems like this option does not work in the mapped mode. I also could not find a way to handle this with the underlying shell task. So I resorted to using manual exception catching. However, I might have missed something so I'm looking forward to your suggestions.

Cheers

AlessandroLollo commented 1 year ago

Hey @AlessandroLollo I pushed some more changes including the option to retrieve the dictionary-based scan results. I have updated the PR description if you want some visibility on what has changed. Most of these changes are the result of making the task work in my flow.

Drilling down on the exception issue, when the task is used in mapped mode, one task (scan's) failure would cause the whole flow (including other scans) to fail. I tried return_state=True as it's the main way of handling failures but seems like this option does not work in the mapped mode. I also could not find a way to handle this with the underlying shell task. So I resorted to using manual exception catching. However, I might have missed something so I'm looking forward to your suggestions.

Cheers

Hey @fakhavan 👋 With regards to the issues with mapped mode, I suggest you to ask in the Prefect community!

AlessandroLollo commented 1 year ago

Hey @fakhavan 👋 In order to fix the static analyses failure, you have to run isort and flake8. They should be triggered automatically using git hooks and pre-commit though...

sodadata / prefect-soda-core