quickwit-oss / tantivy-py

Python bindings for Tantivy
MIT License
293 stars 63 forks source link

Add API reference to documentation site #174

Open mocobeta opened 11 months ago

mocobeta commented 11 months ago

Now we have .pyi file (#167), we can generate API docs by pdoc.

For example, the following commands generate static HTML files in apidoc dir if you have tantivy module in your python path.

pip install pdoc
pdoc tantivy -o ./apidocs

Generated HTML looks very neat because PyO3 exposes comments in .rs files as docstrings of tantivy module, then pdoc collects all of them throughout the stubs and type hints!

tantivy_apidoc

Perhaps we can publish the API doc to https://tantivy-py.readthedocs.io/en/latest/. I'm not familiar with the CI pipeline to publish the docs site. Is there any suggestions about that?

cjrh commented 11 months ago

The current workflow is I log into readthedocs on my own personal account and click "sync". I'm not happy with this, but I don't currently have enough permissions on this github repo to get the webhook working. Once the webhook is setup, docs deployment will be automatic.

fmassot commented 11 months ago

@cjrh do you know which permission you need?

cjrh commented 11 months ago

@fmassot I think either I need "write" access, or I can add you or another admin to the readthedocs site and you can generate the integration there. Long term that is probably best anyway, so that I am not the only one with access to the readthedocs site.

This is the page where the integration can be performed:

image

When I click "resync" it fails:

image

If you create a login on readthedocs (make sure to use your github OAuth login), I can add your account to the tantivy-py project on readthedocs, and then you will be able to successfully click that Resync webhook button, which should work because it'll use the permissions on your github OAuth token. Then I won't need any permissions change on tantivy-py.

Just let me know your login name on readthedocs and I'll add it to the project.

cjrh commented 11 months ago

@mocobeta This example at mkdocs shows integration of pdoc into a mkdocs build: https://github.com/mitmproxy/pdoc/tree/main/examples/mkdocs/

There is a comment that says:

mkdocstrings is a great alternative to pdoc if you are in the mkdocs ecosystem anyways.

Do you have personal experience that supports sticking with pdoc over mkdocstrings?

cjrh commented 11 months ago

I tried a few things, and it seems that mkdocstrings doesn't do a good job of pulling out the docstrings from the rust source. So I guess that means pdoc is better.

mocobeta commented 11 months ago

I didn't know mkdocstrings, but it seems to parse the Python code with Griffle parser (https://mkdocstrings.github.io/griffe/docstrings/) and try to directly extract the docstring from the source instead of pulling it from __doc__. So unfortunatelly this wouldn't be a help for our purposes.

pawamoy commented 10 months ago

Chiming in: it's actually possible to use dynamic analysis with Griffe, to use __doc__ and the like instead of reading sources. It's actually done by default for compiled modules which do not have Python sources :slightly_smiling_face: The edges might still be a bit rough around these use-cases, happy to get features requests / feedback :smile:

cjrh commented 10 months ago

@pawamoy I guess the mkdocstrings is not using that feature then. It would likely be a high value improvement if you could help them to make it work with dynamic docstrings :)

pawamoy commented 10 months ago

I've ran some experiments and identified a couple issues in Griffe that will be fixed in the next version. This will allow you to collect API docs from your Python module compiled from Rust. However there's another issue that I cannot fix on my side: classes like DocAddress and others say their module is tantivy, which is incorrect, as they are actually imported from tantivy.tantivy. I'm not familiar with Pyo3 but the following seems to work:

#[pyclass(frozen, module = "tantivy.tantivy")]

In other words: your compiled module ends up as tantivy/tantivy.cpython-blabla.so, so it is important that classes defined within it report their parent module as tantivy.tantivy, not just tantivy, even if in the end they are publicly exposed from there. Their __module__ attribute must be the canonical location, not the public one.

Also some other classes (Rust structs IIUC) don't even declare their parent module, like SchemaBuilder or Query. They must declare it for Griffe to correctly understand them.

My local diff ```diff diff --git a/docs/reference.md b/docs/reference.md index 8ca4294..d372b22 100644 --- a/docs/reference.md +++ b/docs/reference.md @@ -36,3 +36,16 @@ best_doc = searcher.doc(best_doc_address) Note: for integer search, the integer field should be indexed. For more possible query formats and possible query options, see [Tantivy Query Parser Docs.](https://docs.rs/tantivy/latest/tantivy/query/struct.QueryParser.html) + +::: tantivy.Schema +::: tantivy.SchemaBuilder +::: tantivy.Facet +::: tantivy.Document +::: tantivy.Query +::: tantivy.Order +::: tantivy.DocAddress +::: tantivy.SearchResult +::: tantivy.Searcher +::: tantivy.Index +::: tantivy.Snippet +::: tantivy.SnippetGenerator diff --git a/mkdocs.yml b/mkdocs.yml index 2640f63..5427b17 100644 --- a/mkdocs.yml +++ b/mkdocs.yml @@ -13,3 +13,24 @@ theme: readthedocs # - 'User Guide': # - 'Writing your docs': 'writing-your-docs.md' # - 'Styling your docs': 'styling-your-docs.md' + +plugins: +- search +- mkdocstrings: + handlers: + python: + options: + docstring_options: + ignore_init_summary: true + docstring_section_style: list + heading_level: 2 + inherited_members: true + merge_init_into_class: true + separate_signature: true + show_root_heading: true + show_root_full_path: false + show_source: false + show_signature_annotations: true + signature_crossrefs: true + summary: true + filters: ["!^_"] diff --git a/src/document.rs b/src/document.rs index 8768463..b09e754 100644 --- a/src/document.rs +++ b/src/document.rs @@ -463,7 +463,7 @@ impl<'a> From<&'a Value> for BorrowedSerdeValue<'a> { /// ... {"unsigned": 1000, "signed": -5, "float": 0.4}, /// ... schema, /// ... ) -#[pyclass(module = "tantivy")] +#[pyclass(module = "tantivy.tantivy")] #[derive(Clone, Default, PartialEq)] pub(crate) struct Document { pub(crate) field_values: BTreeMap>, diff --git a/src/facet.rs b/src/facet.rs index 2983fe2..74fb598 100644 --- a/src/facet.rs +++ b/src/facet.rs @@ -16,7 +16,7 @@ use tantivy::schema; /// implicitely imply that a document belonging to a facet also belongs to the /// ancestor of its facet. In the example above, /electronics/tv_and_video/ /// and /electronics. -#[pyclass(frozen, module = "tantivy")] +#[pyclass(frozen, module = "tantivy.tantivy")] #[derive(Clone, Deserialize, PartialEq, Serialize)] pub(crate) struct Facet { pub(crate) inner: schema::Facet, diff --git a/src/index.rs b/src/index.rs index 3eef34f..4636d24 100644 --- a/src/index.rs +++ b/src/index.rs @@ -27,7 +27,7 @@ const RELOAD_POLICY: &str = "commit"; /// /// To create an IndexWriter first create an Index and call the writer() method /// on the index object. -#[pyclass] +#[pyclass(module = "tantivy.tantivy")] pub(crate) struct IndexWriter { inner_index_writer: Option, schema: tv::schema::Schema, @@ -200,7 +200,7 @@ impl IndexWriter { /// /// If an index already exists it will be opened and reused. Raises OSError /// if there was a problem during the opening or creation of the index. -#[pyclass] +#[pyclass(module = "tantivy.tantivy")] pub(crate) struct Index { pub(crate) index: tv::Index, reader: tv::IndexReader, diff --git a/src/parser_error.rs b/src/parser_error.rs index 724d17a..6421215 100644 --- a/src/parser_error.rs +++ b/src/parser_error.rs @@ -799,7 +799,7 @@ impl TryFrom for UnknownTokenizerError { /// The query contains a range query with a phrase as one of the bounds. Only terms can be used as /// bounds. -#[pyclass(frozen)] +#[pyclass(frozen, module = "tantivy.tantivy")] pub(crate) struct RangeMustNotHavePhraseError; #[pymethods] diff --git a/src/query.rs b/src/query.rs index 53da089..962d30d 100644 --- a/src/query.rs +++ b/src/query.rs @@ -3,7 +3,7 @@ use pyo3::{exceptions, prelude::*, types::PyAny}; use tantivy as tv; /// Tantivy's Query -#[pyclass(frozen)] +#[pyclass(frozen, module = "tantivy.tantivy")] pub(crate) struct Query { pub(crate) inner: Box, } diff --git a/src/schema.rs b/src/schema.rs index ba0c740..122a971 100644 --- a/src/schema.rs +++ b/src/schema.rs @@ -7,7 +7,7 @@ use tantivy as tv; /// /// The schema is very strict. To build the schema the `SchemaBuilder` class is /// provided. -#[pyclass(frozen, module = "tantivy")] +#[pyclass(frozen, module = "tantivy.tantivy")] #[derive(Deserialize, PartialEq, Serialize)] pub(crate) struct Schema { pub(crate) inner: tv::schema::Schema, diff --git a/src/schemabuilder.rs b/src/schemabuilder.rs index c59e539..060c394 100644 --- a/src/schemabuilder.rs +++ b/src/schemabuilder.rs @@ -23,7 +23,7 @@ use tantivy::schema::{ /// >>> body = builder.add_text_field("body") /// /// >>> schema = builder.build() -#[pyclass] +#[pyclass(module = "tantivy.tantivy")] #[derive(Clone)] pub(crate) struct SchemaBuilder { pub(crate) builder: Arc>>, diff --git a/src/searcher.rs b/src/searcher.rs index 932e5c9..c202bfd 100644 --- a/src/searcher.rs +++ b/src/searcher.rs @@ -9,7 +9,7 @@ use tantivy::collector::{Count, MultiCollector, TopDocs}; /// Tantivy's Searcher class /// /// A Searcher is used to search the index given a prepared Query. -#[pyclass] +#[pyclass(module = "tantivy.tantivy")] pub(crate) struct Searcher { pub(crate) inner: tv::Searcher, } @@ -40,7 +40,7 @@ impl ToPyObject for Fruit { } } -#[pyclass(frozen, module = "tantivy")] +#[pyclass(frozen, module = "tantivy.tantivy")] #[derive(Clone, Copy, Deserialize, PartialEq, Serialize)] /// Enum representing the direction in which something should be sorted. pub(crate) enum Order { @@ -60,7 +60,7 @@ impl From for tv::Order { } } -#[pyclass(frozen, module = "tantivy")] +#[pyclass(frozen, module = "tantivy.tantivy")] #[derive(Clone, Default, Deserialize, PartialEq, Serialize)] /// Object holding a results successful search. pub(crate) struct SearchResult { @@ -269,7 +269,7 @@ impl Searcher { /// It consists in an id identifying its segment, and its segment-local DocId. /// The id used for the segment is actually an ordinal in the list of segment /// hold by a Searcher. -#[pyclass(frozen, module = "tantivy")] +#[pyclass(frozen, module = "tantivy.tantivy")] #[derive(Clone, Debug, Deserialize, PartialEq, Serialize)] pub(crate) struct DocAddress { pub(crate) segment_ord: tv::SegmentOrdinal, diff --git a/src/snippet.rs b/src/snippet.rs index e74b3ce..e5decc1 100644 --- a/src/snippet.rs +++ b/src/snippet.rs @@ -6,12 +6,12 @@ use tantivy as tv; /// /// The schema is very strict. To build the schema the `SchemaBuilder` class is /// provided. -#[pyclass] +#[pyclass(module = "tantivy.tantivy")] pub(crate) struct Snippet { pub(crate) inner: tv::Snippet, } -#[pyclass] +#[pyclass(module = "tantivy.tantivy")] pub(crate) struct Range { #[pyo3(get)] start: usize, @@ -38,7 +38,7 @@ impl Snippet { } } -#[pyclass] +#[pyclass(module = "tantivy.tantivy")] pub(crate) struct SnippetGenerator { pub(crate) field_name: String, pub(crate) inner: tv::SnippetGenerator, ```

Then you can get something like this generated by mkdocstings:

tantivy

Note that the module has to be built and installed to build the docs, since Griffe actually imports it.

Feel free to mark as off-topic and move my comment in another issue :slightly_smiling_face:

cjrh commented 10 months ago

That's good feedback, thank you for looking into this 😄 . This is the correct issue IMO. I'll try to find some time to fix up the module references next weekend.

cjrh commented 10 months ago

I added a PR here to fix only the module declarations: https://github.com/quickwit-oss/tantivy-py/pull/190. After that is merged I will return to make a second PR using your example showing the mkdocstrings implementation.