Open kernelpernel opened 1 month ago
Thanks for bringing this up @kernelpernel, would it be possible to provide some screenshots and a minimally reproducible example? Don't really understand what you mean by docs being injected.
No screenshots due to possible IP conflicts, but I put together this quick example:
For example, if I write this class:
class ExampleSchema(pa.SchemaModel):
"""Schema to demonstrate doc injection."""
Column1: sc.Integer = sc.IntegerF()
Column2: sc.Str = sc.StrF()
I get this output for the sphinx-generated docs:
class jane_dev.options.utils.doc_testing.ExampleSchema(*args, **kwargs)
Bases: "pandera.api.pandas.model.DataFrameModel"
Schema to demonstrate doc injection.
Check if all columns in a dataframe have a column in the Schema.
Parameters:
* **check_obj** (*pd.DataFrame*) -- the dataframe to be
validated.
* **head** -- validate the first n rows. Rows overlapping with
"tail" or "sample" are de-duplicated.
* **tail** -- validate the last n rows. Rows overlapping with
"head" or "sample" are de-duplicated.
* **sample** -- validate a random sample of n rows. Rows
overlapping with "head" or "tail" are de-duplicated.
* **random_state** -- random seed for the "sample" argument.
* **lazy** -- if True, lazily evaluates dataframe against all
validation checks and raises a "SchemaErrors". Otherwise,
raise "SchemaError" as soon as one occurs.
* **inplace** -- if True, applies coercion to the object of
validation, otherwise creates a copy of the data.
Returns:
validated "DataFrame"
Raises:
**SchemaError** -- when "DataFrame" violates built-in or custom
checks.
Example:
Calling "schema.validate" returns the dataframe.
>>> import pandas as pd
>>> import pandera as pa
>>>
>>> df = pd.DataFrame({
... "probability": [0.1, 0.4, 0.52, 0.23, 0.8, 0.76],
... "category": ["dog", "dog", "cat", "duck", "dog", "dog"]
... })
>>>
>>> schema_withchecks = pa.DataFrameSchema({
... "probability": pa.Column(
... float, pa.Check(lambda s: (s >= 0) & (s <= 1))),
...
... # check that the "category" column contains a few discrete
... # values, and the majority of the entries are dogs.
... "category": pa.Column(
... str, [
... pa.Check(lambda s: s.isin(["dog", "cat", "duck"])),
... pa.Check(lambda s: (s == "dog").mean() > 0.5),
... ]),
... })
>>>
>>> schema_withchecks.validate(df)[["probability", "category"]]
probability category
0 0.10 dog
1 0.40 dog
2 0.52 cat
3 0.23 duck
4 0.80 dog
5 0.76 dog
Column1: pandera.typing.pandas.Series[pandas.core.arrays.integer.Int64Dtype] = 'Column1'
Column2: pandera.typing.pandas.Series[str] = 'Column2'
class Config
Bases: "pandera.api.pandas.model_config.BaseConfig"
name: str | None = 'ExampleSchema'
name of schema
Where I would expect to only see this:
class jane_dev.options.utils.doc_testing.ExampleSchema(*args, **kwargs)
Bases: "pandera.api.pandas.model.DataFrameModel"
Schema to demonstrate doc injection.
Column1: pandera.typing.pandas.Series[pandas.core.arrays.integer.Int64Dtype] = 'Column1'
Column2: pandera.typing.pandas.Series[str] = 'Column2'
And the docs appear to be the same as those from here: Pandera Docs
Thanks for the quick response @cosmicBboy !
It's probably because of the __new__
method: https://github.com/unionai-oss/pandera/blob/main/pandera/api/dataframe/model.py#L127-L132
Can you try overriding that method and seeing if it happens?
@kernelpernel any updates on this issue?
Question about pandera
We use pandera where I work for our dataframe schema. We also use sphinx to generate docs for our python libraries. Unfortunately, the documentation for pandera.pandera.api.pandas.container.DataFrameSchema keeps getting injected into our sphinx-generated documentation. As a work around, we have made most of our schema classes private to prevent doc importing.
We have also tried to write decorators for our own classes to sanitize the docs, but this has been challenging as well. Looking at the entire attribute stack for a class that inherits from pa.DataFrameSchema, most of the doc attributes appear empty. When we try to scrub docs from pandera modules, we end up without any of our own documentation and only have the cat, dog, duck example from pa.DataFrameSchema.
Is this a pandera bug? If not, is there a way that we could suppress the doc injection without removing our own documentation?
TL;DR: pandera is injecting documentation into our own documentation (especially from pandera.pandera.api.pandas.container.DataFrameSchema). Is there a way to prevent this from happening?