voxel51 / fiftyone-docs-search

Search docs.voxel51.com with an LLM!
Apache License 2.0
361 stars 59 forks source link

Search the FiftyOne Docs with an LLM

This repository contains the code to enable semantic search on the Voxel51 documentation from Python or the command line. The search is powered by FiftyOne, OpenAI's text-embedding-ada-002 model, and Qdrant vector search.

'fiftyone-docs-search-cli'

Updates

Installation

  1. Clone the repository:
git clone https://github.com/voxel51/fiftyone-docs-search
cd fiftyone-docs-search
  1. Install the package:
pip install -e .
  1. Register your OpenAI API key (create one):
export OPENAI_API_KEY=XXXXXXXX
  1. Launch a Qdrant server:
docker pull qdrant/qdrant
docker run -d -p 6333:6333 qdrant/qdrant

Usage

Command line

The fiftyone-docs-search package provides a command line interface for searching the Voxel51 documentation. To use it, run:

fiftyone-docs-search query <query>

where <query> is the search query. For example:

fiftyone-docs-search query "how to load a dataset"

The following flags can give you control over the search behavior:

You can also use the --help flag to see all available options:

fiftyone-docs-search --help

Aliasing the command

If you find fiftyone-docs-search query cumbersome, you can alias the command, by adding the following to your ~/.bashrc or ~/.zshrc file:

alias fosearch='fiftyone-docs-search query'

Python

'fiftyone-docs-search-python'

The fiftyone-docs-search package also provides a Python API for searching the Voxel51 documentation. To use it, run:

from fiftyone.docs_search import FiftyOneDocsSearch

fods = FiftyOneDocsSearch()
results = fods("how to load a dataset")

You can set defaults for the search behavior by passing arguments to the constructor:

fods = FiftyOneDocsSearch(
    num_results=5,
    open_url=True,
    score=True,
    doc_types=["tutorials", "api", "guides"],
)

For any individual search, you can override these defaults by passing arguments.

Versioning

The fiftyone-docs-search package is versioned to match the version of the Voxel51 FiftyOne documentation that it is searching. For example, the v0.20.1 version of the fiftyone-docs-search package is designed to search the v0.20.1 version of the Voxel51 FiftyOne documentation.

Building the index from scratch

By default, if you do not have a Qdrant collection instantiated yet, when you run a search, the fiftyone-docs-search package will automatically download a JSON file containing a vector indexing of the latest version of the Voxel51 FiftyOne documentation.

If you would like, you can also build the index yourself from a local copy of the Voxel51 FiftyOne documentation. To do so, first clone the FiftyOne repo if you haven't already:

git clone https://github.com/voxel51/fiftyone

and install FiftyOne, as described in the detailed installation instructions here.

Build a local version of the docs by running:

bash docs/generate_docs.bash

Then, set a FIFTYONE_DIR environment variable to the path to the local FiftyOne repo. For example, if you cloned the repo to ~/fiftyone, you would run:

export FIFTYONE_DIR=~/fiftyone

Finally, run the following command to build the index:

fiftyone-docs-search create

If you would like to save the Qdrant index to JSON, you can run:

fiftyone-docs-search save -o <path to JSON file>

Contributing

Contributions are welcome!

About FiftyOne

If you've made it this far, we'd greatly appreciate if you'd take a moment to check out FiftyOne and give us a star!

FiftyOne is an open source library for building high-quality datasets and computer vision models. It's the engine that powers this project.

Thanks for visiting! 😊

Join the Community

If you want join a fast-growing community of engineers, researchers, and practitioners who love computer vision, join the FiftyOne Slack community! 🚀🚀🚀