quickwit-oss / tantivy

Tantivy is a full-text search engine library inspired by Apache Lucene and written in Rust
MIT License
11.83k stars 658 forks source link

Documentation #653

Open jeffsmith82 opened 5 years ago

jeffsmith82 commented 5 years ago

There are currently a few places that the documentation lives for tantivy:

https://docs.rs/tantivy/0.10.1/tantivy/index.html I would consider these the API docs for tantivy and are mostly complete but missing a lot of examples.

There are multiple examples that are documented in this repo https://github.com/tantivy-search/tantivy-search.github.io though there is no index.html for them and I can only find this one linked from anywhere but I might be wrong. https://tantivy-search.github.io/examples/basic_search.html

There is an incomplete book here https://github.com/tantivy-search/tantivy/tree/master/doc

Lastly there is https://github.com/tantivy-search/tantivy-cli the cli app which is working example code people can copy so would probably consider it a form a documentation.

What I'm proposing is we tidy this all up by removing the examples from tantivy-search.github.io and merge the example code into a new tantivy book that lives in https://github.com/tantivy-search/tantivy/tree/master/doc I'm volunteering myself (probably foolishly) to start it.

I bought tantivy.rs and sent the details to @fulmicoton so this might be a good place to put the book under documentation a bit like https://actix.rs/ does.

What do people think ?

fulmicoton commented 5 years ago

I'd add - even if they are not really documentation- a couple of blog post

A video taken at a mercari meetup

fulmicoton commented 5 years ago

I think there are a lot of high-level concepts, as well as super low-level concepts that could find their place in a book.

The high-level concepts help understand the contract that should also be documented in the reference docs of course... But knowing the reason for this contract as well as the "spirit" of the design of tantivy makes it much easier for users to make the right choice when designing their application, and can save a lot of time. Once you know what tantivy is, the contract are fairly obvious.

Knowing a little more about the low level stuff on the other end is important for contributors AND user with some constraints. It's good for people to understand where abstraction will leak, or understand the performance or memory cost of an extra feature in tantivy.

By experience, Lucene and tantivy are often in that regard counterintuitive, while extremely easy to predict from first principles... Except most user do not bother really understanding how their search index work. I hope a book could bridge this gap.

jeffsmith82 commented 5 years ago

I was thinking something like this for the layout of the book. With the introduction being what the project is, quickstart basically being https://tantivy-search.github.io/examples/basic_search.html for people that want to copy something to get up and running. Then "basic concepts" being the high level concepts of tantivy before going into more depth of each one. This should be split into two where the first section is for users of the library and the second is for people that want to develop tantivy so we explain the lower level stuff.

It might be nice to then have some kind of project to build that goes through building something like the CLI app but probably a bit simpler. I'm still learning tantivy and rust so if I have missed important concepts please shout.

The other thing is what should we write this in. mdbook, gitbook or should we embed it into the website. I really like what https://actix.rs/ did by embedding it into their website under docs using hugo. Any objections to flat out copying that and using it to generate the tativy.rs site with style and image changes ?

petr-tik commented 5 years ago

Awesome ideas - thanks you two.

I just found your blog posts again and wanted to share how helpful I find diagrams/graphs/visualisation in documentation.

we should visualise as many steps as possible inside the data structures and methods that act on them.

eg. Building a term dictionary from a sorted list of words added to an FST.

I also think we should put an FAQ towards the front of the docs and actively add questions from issues/gitter to it. This centralises all informational and makes it greppable for library users and shows that we care about answering questions.

fulmicoton commented 4 years ago

Any objections to flat out copying that and using it to generate the tativy.rs site with style and image changes ?

@jeffsmith82

No objection as long as their license allows it and the tooling is not too horrible. Let's make sure to put credit where it is needed.