unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.34k stars 308 forks source link

Add support for `PANDERA_VALIDATION_ENABLED` for pandas #1345

Open noklam opened 1 year ago

noklam commented 1 year ago

Is your feature request related to a problem? Please describe. A clear and concise description of what the problem is. In 0.16.0, PANDERA_VALIDATION_ENABLED was added to disable runtime check. I want to apply the flag to pandas DataFrame as well.

Describe the solution you'd like A clear and concise description of what you want to happen. The decorator style of validation is convenient but there are no way to turn it off easily and it introduces runtime cost. The feature already exist for PySpark, and I want it for pandas DataFrame as well.

Currently, only PySpark is respecting this configuration: https://github.com/unionai-oss/pandera/blob/5a15cb1e7508743608a181d5a0f35949e200c2ff/pandera/api/pyspark/container.py#L327-L339

Potentially, the logic can be added for pandas https://github.com/unionai-oss/pandera/blob/5a15cb1e7508743608a181d5a0f35949e200c2ff/pandera/api/pandas/container.py#L284-L286

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

noklam commented 1 year ago

I haven't made any PR to pandera before, if this direction is correct I could try to make a PR, but I would like to get some feedback first. Please advise what tests are needed and potentially where should I add them.

cosmicBboy commented 1 year ago

The PR description and approach is good! Basically we need to:

  1. Add the early return in the pandas API schema/schema components
  2. Add tests similar to the ones here in the pyspark tests.
  3. Update the docs, probably a new page dedicated to configuration (if you can write the content I can help with the structure and formatting)
noklam commented 1 year ago

Sounds good! I will try to finish it this week, if not I will be back in mid Oct.

noklam commented 1 year ago

I just have a quick look, does pandera have something like GitPod or Github Codespace for CDE development? If not I can also create a separate PR to add support for GitPod and maybe add this into the contribution guide as an alternative to build locally.

They have a open source program https://www.gitpod.io/discover/opensource

cosmicBboy commented 1 year ago

I think github codespace should just work out of the box, not sure how it installs the virtual environment tho