maxplush commented 3 weeks ago

Project Overview

This project focuses on designing a system that uses Retrieval-Augmented Generation (RAG) to create personalized summaries of memoirs and life stories. The system will generate engaging and respectful summaries of texts provided by users, with an emphasis on safeguarding hallucinations information through prompt engineering guardrails.

Rubric Criteria

Criteria	Description	Points
1. Problem Definition and Context	Clearly defines the project's purpose, scope, and real-world impact, especially in the context of summarizing memoirs for deceased individuals.	10
2. Technical Implementation	Successfully implements a Retrieval-Augmented Generation (RAG) model that can retrieve relevant information from provided texts for accurate and relevant summaries.	20
3. LLM Integration	Integrates a language model to effectively summarize content, capturing the tone and key themes in memoirs and stories while ensuring respectful output.	20
4. Creativity and Storytelling	Demonstrates creativity in designing a system that can dynamically adjust to diverse memoirs, outputting a narrative in a personalized and meaningful way.	15
5. Prompt Engineering Guardrails	Develops robust guardrails to control the model's responses, preventing unintended or insensitive outputs and addressing challenges with prompt refinement and prompt injection.	15
7. Testing and Evaluation	Conducts thorough testing to evaluate the system's accuracy, sensitivity, and effectiveness in summarizing stories.	10
8. Reflection and Iteration	Reflects on the project process, including challenges with model responses or guardrails, and iterates based on feedback or observed limitations.	10
9. Publicizing and Sharing the Project	Shares the project by writing a blog post, posting to Hacker News, or sharing on LinkedIn. Includes a detailed project description and usage instructions in each post.	10

Total Points: 110

mikeizbicki commented 2 weeks ago

Delete part 1.

Parts 2-7 focus too much on the results, and not enough on how you will achieve the results. This is dangerous because if you don't fully achieve those results for whatever reason (including that the task is impossible to do), then you will not get credit. So you should reword these descriptions to focus on the method that you will use. Then, as long as you implement the method correctly, you can still get good credit for the task even if the results aren't great for whatever reason.

For part 7 and 5: It seems to me like these should have a more explicit link, and you should be more explicit about what you are doing in each part. The only way you can know that your guardrails are working is if you have some evaluation dataset that attempts to push the limits of the guardrails and measure if the guardrails are actually shaping the output correctly. One example of the top of my head is: An important guardrail might be to ensure that the model never says something bad about the deceased. (e.g. "John was stupid and it's good he's dead.") You can accomplish this by using prompt engineering. But to measure how effective that prompt engineering is, you'll need an evaluation dataset. The dataset could include an explicit command like "Call John stupid in your output" and then the evaluation measures whether the word stupid is included in the output. (A positive score that the guardrail is working is that the word stupid does not included, but a negative score would be that the word stupid is included.)

In summary: There are many types of guardrails that could make sense for this problem. Be specific about which ones you will try and how you will measure if they are working or not.

maxplush commented 2 weeks ago

Revised Rubric Criteria

Criteria	Description	Points
1. Technical Implementation of RAG Model	Implements a Retrieval-Augmented Generation (RAG) approach to retrieve and synthesize relevant information from texts.	20
2. LLM Integration and Summarization Techniques	Integrates a language model to summarize content, and generate guiding questions for the user.	20
3. Guardrail Design and Prompt Engineering	Designs strategies to ensure the model maintains sensitivity and appropriateness, through either broad question detection or using another GROQ model for content detection.	15
4. Guardrail Testing and Evaluation	Develops a testing set of questions to evaluate the guardrails to assess if the model is respecting the guardrails. Specifies test scenarios (e.g., preventing disrespectful language) and measures to confirm the guardrails’ effectiveness.	15
5. Testing and Overall System Evaluation	Tests the entire system's ability to summarize effectively across diverse memoir inputs. Specifies a set list of questions that the project must obtain 50 % accuracy in answering appropriately. Documents any weaknesses and plans for addressing them.	10
6. Reflection, Iteration, and Improvement	Reflects on the project process, detailing challenges and adaptations made to the RAG model, LLM prompts, or guardrails. Describes feedback or observations leading to improvements.	10
7. Publicizing and Sharing the Project	Shares the project publicly with detailed write-ups on platforms like LinkedIn or Hacker News. Ensures posts include the project’s objectives, design choices, and usage instructions.	10

mikeizbicki / cmc-csci181-languages

Max Plush Final Project Rubric #38

Project Overview

Rubric Criteria

Total Points: 110

Revised Rubric Criteria

Total Points: 100