parth126 / IT550

Project Proposals for the IT-550 Course (Autumn 2024)
0 stars 0 forks source link

IntelliHelp: RAG-Powered Customer Assistance Using LLM #13

Open Ayush060201 opened 3 days ago

Ayush060201 commented 3 days ago

Title

IntelliHelp: RAG-Powered Customer Assistance Using LLM

Team Name

Run Time Errorists

Email

202311048@daiict.ac.in

Team Member 1 Name

Shyam Saktawat

Team Member 1 Id

202311048

Team Member 2 Name

Abhishek Choudhary

Team Member 2 Id

202311067

Team Member 3 Name

Ayush Kumar Sahu

Team Member 3 Id

202311066

Team Member 4 Name

NIL

Team Member 4 Id

NIL

Category

New Research Problem

Problem Statement

Current organizational chatbots struggle to autonomously resolve user queries, as they fail to effectively integrate RAG and LLMs with proprietary data. IntelliHelp aims to offer personalized, real-time customer support by retrieving relevant information and generating dynamic, accurate responses without human intervention.

Evaluation Strategy

  1. Response Time Metrics: Average response time per query.
  2. Engagement and Retention Metrics: User retention rate, average session duration, and number of returning users.
  3. User Satisfaction Metrics: Post-interaction surveys, Net Promoter Score (NPS), and Customer Satisfaction Score (CSAT).

Dataset

https://github.com/unicamp-dl/retailGPT/tree/main/retailGPT/datasets

Resources

[1] LEWIS, Patrick et al. Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arXiv, 2020. [2] XU, Ziwei; JAIN, Sanjay; KANKANHALLI, Mohan. Hallucination is Inevitable: An Innate Limitation of Large Language Models. arXiv, 2024. [3] WEI, Jason; WANG, Xuezhi; SCHUURMANS, Dale et al. Chain-of-Thought Prompting Elicits Reasoning in Large Language Models. arXiv, 2022.

parth126 commented 2 days ago
Ayush060201 commented 4 hours ago

New Problem Statement

Title Optimize InterrogateLLM for Enhanced Hallucination Detection in LLMs

Category Optimization

Problem Statement Large Language Models (LLMs) have shown remarkable capabilities in generating human-like text, but they often produce hallucinations - plausible-sounding but factually incorrect information. This poses significant challenges for their reliable use in real-world applications. While the InterrogateLLM method presents a novel approach to zero-resource hallucination detection, it still faces several limitations that impact its accuracy and efficiency.

Evaluation Strategy

Metrics:

  1. AUC (Area Under the Curve): Measures the overall performance of the binary classification.
  2. B-ACC (Balanced Accuracy): Accounts for imbalanced datasets by averaging the recall for each class.

IOU (Intersection over Union) Score:

  1. Used specifically for the Movies dataset.
  2. Answers with IOU scores below 80% were considered hallucinations.

Comparison with Baselines:

  1. SBERT-cosine: Using pre-trained SBERT model for embeddings.
  2. ADA-cosine: Utilizing OpenAI's text-embedding-ada-002 model.
  3. SelfCheckGPT: A method that generates multiple samples and compares them.

Dataset Movies Dataset - https://www.kaggle.com/datasets/rounakbanik/the-movies-dataset Books Dataset - https://www.kaggle.com/datasets/saurabhbagchi/books-dataset

Resources [1] https://arxiv.org/abs/2403.02889