[BUG] pandera model wrongly detected as pydantic and pytkdocs tries to read non existent attributes

camold commented 1 year ago

First of all, thanks for developing pytkdocs!

I do not use pytkdocs directly, but rather mkdocs and mkdocstrings which call pytkdocs. Here is an example for reproduction.

import pandera as pa
from pandera.typing import DataFrame
from pandera.typing import Series

class Foo(pa.DataFrameModel):
    """
    Some description
    """
    bar: Series[int]

cause_error = DataFrame[Foo]({"bar": [1,2,3]})

Without any instantiated code (that actually uses the panderas models) it runs just fine. But, as soon as I USE the models somewhere in some precomputed objects pytkdocs runs into the following errors:

ERROR    -  mkdocstrings: 'tuple' object has no attribute 'required'
            Traceback (most recent call last):
              File "/usr/local/lib/python3.10/dist-packages/pytkdocs/cli.py", line 205, in main
                output = json.dumps(process_json(line))
              File "/usr/local/lib/python3.10/dist-packages/pytkdocs/cli.py", line 114, in process_json
                return process_config(json.loads(json_input))
              File "/usr/local/lib/python3.10/dist-packages/pytkdocs/cli.py", line 91, in process_config
                obj = loader.get_object_documentation(path, members)
              File "/usr/local/lib/python3.10/dist-packages/pytkdocs/loader.py", line 358, in get_object_documentation
                root_object = self.get_module_documentation(leaf, members)
              File "/usr/local/lib/python3.10/dist-packages/pytkdocs/loader.py", line 426, in get_module_documentation
                root_object.add_child(self.get_class_documentation(child_node))
              File "/usr/local/lib/python3.10/dist-packages/pytkdocs/loader.py", line 544, in get_class_documentation
                self.add_fields(
              File "/usr/local/lib/python3.10/dist-packages/pytkdocs/loader.py", line 612, in add_fields
                root_object.add_child(add_method(child_node))
              File "/usr/local/lib/python3.10/dist-packages/pytkdocs/loader.py", line 712, in get_pydantic_field_documentation
                if prop.required:
            AttributeError: 'tuple' object has no attribute 'required'

It seems these pandera models are detected as pydantic, but they do not have the same attributes. For proper pydantic classes we have this:

from pydantic import BaseModel
class Test(BaseModel):
    i: int
Test.__fields__["i"]
# yields True

For pandera models we seem to have this

import pandera as pa
from pandera.typing import Series
from pandera.typing import DataFrame

class Foo(pa.DataFrameModel):
    bar: Series[int]

Foo.__fields__
# is {}
foo = DataFrame[Foo]({"bar": [1,2,3]})
# after instantiating things exist and probably that's why it causes errors in pytkdocs
Foo.__fields__["bar"]
# <pandera.typing.common.AnnotationInfo at 0x7f...>, <pandera.api.pandas.model_components.FieldInfo("bar") object at 0x7f1...>)

pytkdocs 0.16.1, python 3.10.7, Linux pydantic 1.10.11 with pydantic_core 2.1.2 (pandera imposes a restriction of pydantic <2) pandera 0.15.2

pawamoy commented 1 year ago

Hello, thanks for the report.

Do you use the legacy handler by necessity? Out of curiosity, is there something preventing you from using the new handler?

camold commented 1 year ago

Hi @pawamoy, thanks for pointing out that we are using a legacy handler. I guess there was a phase where our docs were not supported yet so I kept working with mkdocstrings[python-legacy]. Upon upgrade to griffe things also break. I get runtime errors (IndexError: list out of range) inside griffe without any (for me readable) information on what is wrong. I guess I will need to cook up a minimal example and post it as issue for griffe :(

pawamoy commented 1 year ago

That would be great if you could report these issues you get indeed. If your repo is public, I can also use it to investigate (this way you don't need to create a minimal example).

I bet the index errors come from how we parse Returns section in docstrings. Try indenting continuation lines once more:

Returns:
    A long description
    of the return value.
    Blah blah blah.

->

Returns:
    A long description
        of the return value.
        Blah blah blah.

Unless you're not using Google docstrings?

camold commented 1 year ago

That worked indeed. At least there are no runtime errors anymore. I did notice a change though from pytkdocs to griffe. Before, if I had submodules, pytkdocs would list them in the documentation. Now it really only lists the main module, and not even any objects that I import from submodules. So I guess I will have to add pages individually for these submodules or get the automatic reference creation to work (is this still the best approach: https://mkdocstrings.github.io/recipes/ ?)

pawamoy commented 1 year ago

Still the best approach, yes. And you can use show_submodules: true to render every submodule (see https://mkdocstrings.github.io/python/usage/configuration/members/#show_submodules). We changed the default from true to false between the legacy and new handler.

camold commented 1 year ago

Great. Thanks for pointing that out. It worked just fine. However, submodules that have a function with the same name in it will not be processed (they don't show up in the documentation). So e.g. if you had package.foo as submodule that has a function foo in it, the entire package.foo submodule will be excluded. The same if your package was already called foo and you had a foo submodule (foo/foo.py) in it, the package will not be rendered.

pawamoy commented 1 year ago

Yes, these are known issues and we plan to alleviate them. Note that wildcard imports make the situation worse, I recommend avoiding them.

mkdocstrings / pytkdocs

[BUG] pandera model wrongly detected as pydantic and pytkdocs tries to read non existent attributes #148