Pipeline description is unclear

stanford-oval / WikiChat

WikiChat is an improved RAG. It stops the hallucination of large language models by retrieving data from a corpus.

https://wikichat.genie.stanford.edu

Apache License 2.0

998 stars 95 forks source link

Pipeline description is unclear #16

Closed jwbth closed 1 month ago

jwbth commented 3 months ago

I was passing by, and when trying to make sense of the pipeline, I'm stuck on the first step:

Query: cast of Oppenheimer Christopher Nolan film

How is the query extracted from the message in natural language? Does this involve some NLP standard tasks for text mining or even another message to LLM? In any case, it seems that query is not the actual first step as it is stated.

s-jse commented 3 months ago

Thank you for your interest in our project.

You can refer to Section 3.1.1, of our paper: https://arxiv.org/pdf/2305.14292:

WikiChat generates a search query that captures the user’s interest with a prompt.

Meaning that the query is generated by prompting the LLM.

jwbth commented 3 months ago

Thanks for making it clear for me. I would be even more glad if you made this clear to everyone in README, but up to you.