pdoc3 / pdoc

:snake: :arrow_right: :scroll: Auto-generate API documentation for Python projects
https://pdoc3.github.io/pdoc/
GNU Affero General Public License v3.0
1.12k stars 145 forks source link

Submodules not showing up for (native) extension modules #319

Open robamler opened 3 years ago

robamler commented 3 years ago

When running pdoc on an extension modules (aka, a native "C" extensions), the extension module's submodules don't show up in the documentation even though <TAB>-autocomplete in a Python REPL can find the submodules. This seems to be because pdoc searches for submodules by inspecting the source directory, which isn't available for extension modules.

I've proposed PR #318 to fix this issue. The proposed solution works but I'm not sure if it is safe enough to remove the old "source directory traversal" method. I'd appreciate guidance on completing the PR.

Expected Behavior

Running pdoc on a native extension module should generate documentation for the entire extension module, including its submodules.

Actual Behavior

Steps to Reproduce

The following steps generate a minimalistic native extension module in Rust that exhibits the problem. The language shouldn't matter though.

  1. Install a rust toolchain, see https://rustup.rs
  2. Create the following directory structure:
    pyext/
    ├── Cargo.toml
    └── src/
    └── lib.rs

    with the following file contents:

    • Cargo.toml:
      
      [package]
      authors = ["Name <em@i.l>"]
      edition = "2018"
      name = "pyext"
      version = "0.1.0"

[lib] crate-type = ["cdylib"]

[dependencies] pyo3 = {version = "0.13.2", features = ["extension-module"]}

- `src/lib.rs`:
```rust
use pyo3::{prelude::*, wrap_pymodule};

/// Docstring of main module.
#[pymodule(pyext)]
fn init_main_module(_py: Python<'_>, module: &PyModule) -> PyResult<()> {
    module.add_wrapped(wrap_pymodule!(submodule))?;
    Ok(())
}

/// Docstring of submodule
#[pymodule(submodule)]
fn init_submodule(_py: Python<'_>, submodule: &PyModule) -> PyResult<()> {
    submodule.add("variable", 42)?;
    Ok(())
}
  1. Compile the extension module: cargo build
  2. Create a properly named symlink to the object file:
    • on Linux: ln -s target/debug/libpyext.so pyext.so
    • on Mac: ln -s target/debug/libpyext.dylib pyext.so
    • on Windows: rename target\debug\libpyext.dll to pyext.pyd
  3. Start a Python REPL from the directory containing the pyext.so file and verify that the submodule exists and can be found by tab completion:
    $ python
    Python 3.6.10 (default, May 22 2020, 17:59:48) 
    [GCC 9.2.1 20191008] on linux
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import pyext
    >>> pyext.<TAB>  --> autocompletes to "pyext.submodule", proving that the submodule can be found
    >>> pyext.submodule.variable
    42
  4. Run pdoc --html pyext from the same directory.
    • With the version at current master, the generated documentation leaves out the submodule.
    • With the version proposed in #318, the generated documentation includes the submodule.

Additional info

kernc commented 3 years ago

Thanks for an exemplary bug report!

Just to clarify: Step 5, when we import pyext, could we just as well have done:

>>> import pyext.submodule

# or

>>> from pyext.submodule import variable

Does this run?

robamler commented 3 years ago

Just tested it:

$ python
Python 3.8.2 (default, Mar  2 2021, 23:57:34) 
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import pyext.submodule
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pyext.submodule'; 'pyext' is not a package
>>> from pyext.submodule import variable
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pyext.submodule'; 'pyext' is not a package
>>> from pyext import submodule
>>> submodule.variable
42

I think this is because the extension module pyext is compiled into a single binary file that cannot be loaded "in parts" (unlike regular modules, whose implementation is typically scattered across several source files). The python interpreter doesn't know about the submodules until it actively loads the pyext module, which (I think) it only does when you explicitly say either import pyext or from pyext import xxx.

In other words, I think from A import B actually loads A (but only brings A.B into scope, as B), so from pyext.submodule import variable would try to load pyext.submodule, which doesn't exist in the file system because it only gets generated "in memory" when you load pyext.

kernc commented 3 years ago

That's exactly why I asked because I remembered resolving to wontfix about a similar issue just recently. See my thoughts in https://github.com/pdoc3/pdoc/pull/252#issuecomment-698361252. The simple fact is:

ModuleNotFoundError: No module named 'pyext.submodule'; 'pyext' is not a package

pyext.submodule is not a module to have stuff imported from, so I'm hesitant to make pdoc list it as such.

Can you investigate if you can set .__package__ and .__path__ attributes (or whatever is necessary to interpret Python module as a package) upon the relevant package/module objects and if maybe that automatically does something?

robamler commented 3 years ago

Thank you for the explanation! Unfortunately, setting .__package__ and .__path__ in the extension module doesn't help.

I respect your decision if you don't want to address this. I'd just like to raise two counter arguments for your consideration. First, I think this issue will probably affect a lot of people (probably all authors of native extension modules that don't find some sort of workaround). Second, I am interpreting the ModuleNotFoundError in a different way. In fact, I get the same error message when I try to import, e.g., pdoc:

>>> import pdoc
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'pdoc'

The reason is that I'm in a python environment where pdoc isn't installed, and that's why the python interpreter can't find it and thus throws a ModuleNotFoundError. So, even though pdoc definitely is a module (it's even a package), it just can't be found at the moment. But as soon as you bring it in your sys.path, it can be found. I'd argue that the situation for pyext.submodule is quite similar: it is a module, it just can't be found at the moment. But as soon as you import pyext (which is the package on which I want to run pdoc anyway), then pyext.submodule can be found (and is recognized as a module):

>>> import pyext
>>> type(pyext.submodule)
<class 'module'>

I agree that pyext.submodule is not a package (e.g., it doesn't have a .__path__ set), but I think that shouldn't make a difference.

kernc commented 3 years ago

it just can't be found at the moment

That's correct. That's why Python has sys.path_hooks (to maybe provide a suitable finder/loader for a given package) and sys.meta_path, which is a list of already registered default finders.

Following related upstream issues:

I think PyO3 might wish to provide a finder akin to the one removed in https://github.com/PyO3/pyo3/pull/1269/commits/8d14568f7d3077924a23e3f15392d13180cbc828 (briefly discussed in https://github.com/PyO3/pyo3/pull/1269#discussion_r520807688), and add it to sys.meta_path upon loading the top-level extension module. This way, both pdoc pyext as well as >>> import pyext.submodule would work flawlessly, and it'll be justified to call pyext.submodule a package and its submodule (instead of merely a variable pointing to a module object such as with >>> import re as my_re).

I'd just hate to have pdoc's deviate from the Python's interpretation of stuff.

Then again, we do check in https://github.com/pdoc3/pdoc/pull/318 that the object is present in __all__, so the intent is visible, and there's little utility in documenting modules containing further objects as mere variables. The end-user will be confused that they can't:

from your_pyext.submodule.nested import Something

But that's not really our problem ... :thinking:

kernc commented 3 years ago

There's apparently a workaround described in https://github.com/PyO3/pyo3/issues/1517#issuecomment-808664021.