Open rmetcalfe-msp opened 2 months ago
This is an unfortunate naming collision.
When I added Hypothesis checks to the codebase I didn't anticipate ever using the hypothesis
library for data synthesis.
pip install pandera[hypotheses]
unlocks hypothesis checks.
pip install pandera[strategies]
unlocks data synthesis strategies, which uses hypothesis
.
To clarify this it would make sense to add the appropriate pip install commands in the corresponding pages:
pip install pandera[strategies]
pip install pandera[hypotheses]
Would you be able to make a PR for the docs updates?
@rmetcalfe-msp any thoughts on the docs solution to clarify this behavior?
@cosmicBboy thanks for the explanation, sounds reasonable. I'll try address in a PR when I have some time.
Location of the documentation
https://pandera.readthedocs.io/en/stable/data_synthesis_strategies.html#usage-in-unit-tests https://pandera.readthedocs.io/en/stable/index.html#extras
Documentation problem
When reading about how to create example dataframes from schemas on this page it mentions that you need hypothesis library. Skipping to the index page/README to understand if that's an optional dependency seems to indicate it is, though there's some confusion over spelling
hypothesis
vshypotheses
.This lead me to get
ModuleNotFoundError: No module named 'hypothesis'
errors until I dug further and discovered that callingpip install pandera[hypotheses]
only actually installs scipy, and you have to installpandera[strategies]
to get hypothesis.Suggested fix for documentation
I would suggest including
hypothesis
library with thepandera[hypotheses]
optional install and fixing the discrepancy in spelling, or specifying in the strategy page that you need to installpandera[strategies]
for this functionality.