unionai-oss / pandera

A light-weight, flexible, and expressive statistical data testing library
https://www.union.ai/pandera
MIT License
3.17k stars 298 forks source link

Optional import hypotheses doesn't install hypothesis #1649

Open rmetcalfe-msp opened 2 months ago

rmetcalfe-msp commented 2 months ago

Location of the documentation

https://pandera.readthedocs.io/en/stable/data_synthesis_strategies.html#usage-in-unit-tests https://pandera.readthedocs.io/en/stable/index.html#extras

Documentation problem

When reading about how to create example dataframes from schemas on this page it mentions that you need hypothesis library. Skipping to the index page/README to understand if that's an optional dependency seems to indicate it is, though there's some confusion over spelling hypothesis vs hypotheses.

This lead me to get ModuleNotFoundError: No module named 'hypothesis' errors until I dug further and discovered that calling pip install pandera[hypotheses] only actually installs scipy, and you have to install pandera[strategies] to get hypothesis.

Suggested fix for documentation

I would suggest including hypothesis library with the pandera[hypotheses] optional install and fixing the discrepancy in spelling, or specifying in the strategy page that you need to install pandera[strategies] for this functionality.

cosmicBboy commented 2 months ago

This is an unfortunate naming collision.

When I added Hypothesis checks to the codebase I didn't anticipate ever using the hypothesis library for data synthesis.

pip install pandera[hypotheses] unlocks hypothesis checks.

pip install pandera[strategies] unlocks data synthesis strategies, which uses hypothesis.

To clarify this it would make sense to add the appropriate pip install commands in the corresponding pages:

Would you be able to make a PR for the docs updates?

cosmicBboy commented 2 months ago

@rmetcalfe-msp any thoughts on the docs solution to clarify this behavior?

rmetcalfe-msp commented 2 months ago

@cosmicBboy thanks for the explanation, sounds reasonable. I'll try address in a PR when I have some time.