Feedback Needed: Storytelling RAG System for Preserving Memories of Loved Ones (Prof Izbicki)

maxplush commented 1 week ago

posting for Prof Izbicki but all feedback is welcome

I’m working on a project that aims to preserve memories by allowing users to upload stories about loved ones who have passed away. The system will then allow you to chat with those stories(through the rag system). Another idea was having GROQ tag and categorize these stories so they can be easily searched. It will also have a chat interface where users can prompt the system to retrieve specific stories based on keywords in their questions.

My Concerns:

Complexity - I’m worried this project might not be complex enough for a final project. Do you think adding features like user-curated tags, multimodal content (photos, audio) could make it more impressive? Are there any other suggestions to increase its complexity while also aligning with class topics?

Authenticity and LLM Use - To preserve the authenticity of these stories, I’m concerned about using an LLM. My worry is that it could introduce hallucinations if trained on this type of text, which would undermine the purpose of maintaining real memories. Does anyone have thoughts on how to use an LLM solely for retrieval without compromising authenticity?

RowanGray472 commented 1 week ago

Maybe you set it up such that when a user inputs a story, the llm generates the tags for the key points in the story that you want to be queryable and then immediately shows them to the user so that the user can verify that those points make sense for the story? I'm not sure if this would save time relative to the user just doing it manually though.

I would bet a lot of the querying that isn't about complicated ideas could just be done in SQL.

mikeizbicki commented 1 week ago

One important point is that LLMs "cannot" be used for search. (Cannot is in scare quotes because I can imagine lots of ways they "could" be used, be these are all very round about and do not actually use any of the things that make LLMs powerful. Another way of phrasing this is that any search system that uses an LLM can be made more accurate and faster by removing the LLM from the system.)

So using an LLM just for search is not a good project. To make it into a good project, you need to somehow incorporate the generative aspect of the LLMs. You're original frame was to have the LLM generate stories, but the concerns about hallucination are correct, and so generating stories probably not appropriate. But there is other information that could be generated as well. For example, I could imagine wanting to ask the following questions to an LLM about a deceased person (who I'll call John):

Where did John grow up?

What age did John get married?

The examples above are strictly factual pieces of information, and I would expect an LLM to be able to answer these well without a hallucination problem. The resulting answers would probably be only a single sentence.

I could also imagine more complicated questions/answers, like the following:

Who were the important people in John's life?

Why did John marry Jane?

Did John like his job?

These are all much more complicated questions. They can't fully be answered in only a single sentence, but require at least a paragraph or so. I think an LLM would still be able to answer these types of questions well, however, because they all involve "summarizing" information that would be found in a memoire, and LLMs tend to be very good at summarizing.

It's important to observe that your original style of question was a "tell me a story" style of question, and this is likely to cause the LLM to hallucinate, but this new style of question is "summarize something from the memoire" which is less likely to have hallucinations. Getting this to work well would require some combination of prompt engineering to ensure the model is in "summarizing" mode instead of "storytelling" mode and effort from the user to ask questions that can easily be reframed as summarization questions.

There's other interesting work you could consider like adding "guardrails". For example:

You could try to design the prompt to encourage the model to explicitly say it doesn't know the answer to story telling questions. (Perhaps giving it a list of examples of questions that it shouldn't try to answer, and other questions that it should try to answer.)
Having a separate model first check to see if the question is appropriate, and then only trying to answer the question if the first model thinks it's appropriate.
You could encourage the model (via the prompt) to interleave quotes and references (e.g. page numbers) into it's reply. LLMs tend to be good at quoting and referencing material that is included in their prompt (e.g. via rag). The reputation of LLMs to hallucinate references is when they are trying to reference something that is not included in the prompt, but only included in the original training data, which is not your use case.

maxplush commented 1 week ago

Thanks for the feedback! I understand—using an LLM just for search misses the point. I'll look into using it for summarization-style queries, and maybe add some guardrails for reliability.

mikeizbicki / cmc-csci181-languages

Feedback Needed: Storytelling RAG System for Preserving Memories of Loved Ones (Prof Izbicki) #32