marqo-ai / marqo

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai
https://www.marqo.ai/
Apache License 2.0
4.46k stars 183 forks source link

[ENHANCEMENT] LLM based reranker support #244

Open jn2clark opened 1 year ago

jn2clark commented 1 year ago

Is your feature request related to a problem? Please describe. Have the ability to use a LLM as a reranker in Marqo and allowing for augmented retrieval. Currently this is not supported. This can allow for using the LLM as a reranker ("rank these based on X...") or summarisation and to cite the original work.

Describe the solution you'd like when searching have the option to add a LLM as a reranker. Initially it should support hosted endpoints for the LLMs.

client.index(index_name).search("why is the sky blue", reranker={"name":'GPT3', 'api_key':###, 'engine':'davinci-003', prompt = "Given the query and the context, summarise and cite the references. QUERY:{}"} It could allow for a free text prompt or some could be curated for predefined tasks, e.g. task="summarise"

Describe alternatives you've considered

Additional context An example here https://github.com/marqo-ai/marqo/tree/mainline/examples/GPT3NewsSummary this feature would bring the functionality to within Marqo.

jn2clark commented 1 year ago

very early draft of branch https://github.com/marqo-ai/marqo/compare/mainline...gpt_reranker1.

pandu-k commented 1 year ago

GPT3 reranking for Marqo

Overview

We can use GPT3 to rerank the results of a Marqo retrieval. We can benefit from the fast retrieval of tensor and lexical search, coupled with LLM reranking models' slower, but potentially more accurate, reranking ability.

This will also pave the way for alternative text-based model rerankers, and 2nd stage text retrieval processing. For example, text summarisation.

Release note blurb

Marqo-GPT3 integration: GPT3 is now available as a Marqo reranker. Retrieve your documents via tensor search, and re-order them with GPT3 for better retrieval.

Proposed design

API and Py-Marqo changes

A new optional parameter is added to the search method:

response = mq.index("my-test-index").search(
    "What plants have great heat resilience?", searchable_attributes=["Description", "Title"], 
    reranker="openai/GPT3/rerank",
    reranker_api_key="<...api key...>"
)

Marqo changes

Feature implementation

  1. A reranker_api_key parameter is added as an optional paramater:

  2. We create an elif checking for a GPT reranker in reranking.rerank_search_results()

  3. Create a file src/marqo/s2_inference/reranking/openai/gpt3.py which will interface with the GPT3 API. Note that the format (k, v pair) of the reranker model in the cache can differ from the existing models. This can lay the path towards a refactor of the cache. This is OK because reranker models and embedding models aren't used the same way in the code. We do need to be careful about the model and device info endpoints. Tests will need to be created to ensure that they work as intended for the reranker models.

  4. This will have an entrypoint function rerank(search_results: dict, query: str) -> dict

  5. The prompt, also defined in this file, will have a structure like this:

f"""
    Background:
    {"\n".join["Source {i+1}) {content}" for i, content in enumerate(results)])}

    Query: {question}

    Instruction: rank the sources with respect to relevance of the query
"""

The results from GPT look like this:

1) Source 4: The 15 Best Heat Tolerant Plants (Because the Dog Days of Summer Are Here!)
2) Source 1: 20 Heat-Tolerant Plants That Will Survive (and Thrive) in the Summer
3) Source 3: Top 10 Heat-Tolerant Plants
4) Source 2: 7 Heat-Tolerant Plants that Love the Sun
5) Source 0: Plants That Like Full Sun and Heat

We parse the results to get the original indexes, which will be used to order the results.

  1. If we get an error we raise it as an RerankerError and forward the message from GPT3,

Minor refactor to reduce stateful-ness

Make rerank_search_results()operate on a copy of tensor_search.search()'s results

  1. tensor_search.search() returns a reranked copy of results that are received from rerank_search_results()
  2. In rerank_search_results(), a copy of the search results is made at the top of the function. This copy is passed to the rerankers
  3. For OWL and text rerankers, this copy is returned (as this copy is mutated by the rerankers)
  4. For the GPT reranker, the output from GPT's rerank function is returned directly

Caveats

Alternative API design: reranker properties dict

API and Py-Marqo changes

A new optional parameter is added to the search method:

response = mq.index("my-test-index").search(
    "What plants have great heat resilience?", searchable_attributes=["Description", "Title"], 
    reranker="openai/GPT3/rerank",
    reranker_properties={
        "api_key": <... your api key here...>
    }
)

An optional dict called reranker_properties of replaces the api_key parameter in the search endpoint body, tensor_search.search(), reranking.rerank_search_results().

The benefit of this approach is that we don't have to keep adding new parameters to the search endpoint body, tensor_search.search(), and py-marqo.search() signatures to add extra reranking parameters. We would only need to handle the new parameters in reranking.rerank_search_results().

The API, however, becomes more nested.

pandu-k commented 1 year ago

Can add reranker models to the cache in a different way to embedding models

pandu-k commented 1 year ago

Rather than re-ordering the existing documents, attach a new document (e.g.: reranker_output). The user can parse it on their end.

pandu-k commented 1 year ago

Create a mini DSL:

instruction = "summarise the following sources and cite them:
SOURCES:{}
SUMMARY:"
pandu-k commented 1 year ago

Move towards reranking on multiple fields