Closed Solobrad closed 1 month ago
Hey @Solobrad
Thanks for taking this bit up!
For the time being would you be interested in building a quick RAG pipeline with the Llama family of models? Once that is done, we could look into SRF-RAG as an enhancement.
The suggestion is based on the fact that this repository (huggingface-llama-recipes) is built with the idea of helping anyone to get started quickly.
Please let me know how you feel about my suggestion. Also feel free to ask any questions if you have.
Hi @ariG23498
I'm in to collaborate with other teams on this repo. Thanks for the opportunity!
That would be great!
But would you be open to implementing a very simple (here simple is the keyword) RAG pipeline in the first place?
If you are fine with that, I can redirect other contributors to this issue so that you can collaborate with them on this.
Yup count me in
Hi @ariG23498 , Thanks for redirecting me to this issue. @Solobrad I'm looking forward to collaborating with you.
Same here, nice meeting you @Purity-E !
@Purity-E, we are required to start simple so I've added a very basic pipeline code. If the PR is accepted, we can work on enhancements later
@ariG23498 I see a few issues same as mine, should we bring them over here? Brainstorming on the enhancements.
@Purity-E, we are required to start simple so I've added a very basic pipeline code. If the PR is accepted, we can work on enhancements later
Cool. Thanks for the update.
@ariG23498 I see a few issues same as mine, should we bring them over here? Brainstorming on the enhancements.
Feel free to. Having said that, we are not really looking for a very complicated project with RAG. It should be enough to get anyone started with RAG using Llama.
Hey @ariG23498 thanks for redirecting me to here. Hii @Solobrad looking forward to working with you!
Hi @atharv-jiwane welcome to the team! I like your idea though, image retrieval can be more effective, especially with PDFs.
Hey! This is my first time contributing to an open source project so I am really excited! I saw the PR that was created pertaining to this issue and wanted to discuss the how we are going to build from the initial commit. Also saw @ariG23498 's comments on the PR and wanted to take that up. Let me know how you wanna divide/distribute the work.
Sure man, which would you like to work on? I'm thinking of adding you both to my forked repo as collaborators, so we can discuss work delegations and work on it. Let me know if that works for you. @Purity-E @atharv-jiwane
@Solobrad sure that's okay
@Solobrad Yup sounds good! I could take up the embedding part
Cool, we'll be working on the "llama-rag" branch then.
I'll check on the dataset.
I've added a transcript dataset @atharv-jiwane, it's clean and pretty straightforward. You can try embedding it. Thanks
I have tried embedding the dataset. I am not sure I committed the changes properly, @Solobrad could you please guide me
Hey @atharv-jiwane , I saw an error about the LLaMa access try filling in the access form https://huggingface.co/meta-llama/Llama-3.1-8B. Even though LLaMa is an open-sourced LLM, we normally have to fill in an application before we can use it from Hugging Face or Kaggle. Does this answer your question?
I changed the code a little because the naming you used for sentenceTransfromer was rewriting the previous LLaMa model. Go ahead and check it out. Hope this helps.
Hey @Solobrad , I've reviewed your changes and I have filled the verification form for using LLaMa. Thank you for the information on that. I think it takes some time to get the request reviewed.
Meanwhile, I think the only changes I have made are in the embeddings sections right after the dataset has been imported so could you please commit an error-free version of the code to the "llama-rag" branch?
Hey @Solobrad , I have added an LLM pipeline in the latest commit and fixed the earlier auth issues with LLaMa models. I tried to run the query but it took too long to generate a response. Could you please guide me as to where I am going wrong?
Also, the earlier version of embeddings that I wrote could instead just be done when we create the vector store right?
Yup @atharv-jiwane , just create the vector store and let it handle embedding the documents. You don't need to separately encode them. If this was what you were asking.
I'll try checking on the prolonged response time.
Cool, so @Solobrad let's do away with the separate encodings? Also can we add GPU support? I am running this locally on a Macbook Air M2 so I think GPU support would be nice.
Also pertaining to the response time, when I first passed the query ("What is Hugging Face.") to the LLM there was an error code generated which said that the max_new_tokens was exceeded beyond 20. This might also be causing an issue.
Hi, I solved the max token problem, and I attribute the ‘long’ response time to using the model locally. I’ve tried to use APIs directly and also a smaller model.
@Purity-E @atharv-jiwane, I've pushed the latest runnable code, go on and have a try.
@Solobrad Thanks for the update! I’ll have a go soon, running slightly busy
@sinatayebati will be joining us.
Hey everyone, I suggest adding some compelling markdown so users can easily read what's going on (as mentioned before). So it's like a simple DEMO or tutorial, what's your take on this? @sinatayebati @atharv-jiwane @Purity-E
PS: I added the LLaMa back
@Solobrad Hey Nicholas. Thanks for the latest commits. In my opinion this latest notebook should be very close to what HF team has in mind. I also just pushed two minor updates:
Awesome, thanks!
Hey @Solobrad! I think the latest commit looks good. I think we should consult the maintainers and ask for their opinion on this
I've updated the code according to the latest requirements guys @Purity-E @atharv-jiwane. Feel free to add any markdowns or so. You should use Google Collab if you want to run the code.
Closing this issue as the PR has been merged! Thanks for the great contribution.
Modern LLMs like Llama seem to outperform traditional RAG methods on long-context tasks, demonstrating improved context handling and understanding, which may lead to reconsidering the need for RAG in many scenarios.
However, I recently came across something called SRF-RAG, which offers several key benefits.
Retrieval: Retrieves relevant context from external sources. Generation: Produces coherent responses based on retrieved context. Instruction Tuning: Improves understanding of complex queries. Hallucination Reduction: Minimizes incorrect or misleading information. Multi-Hop Reasoning: Handles complex questions by synthesising information from multiple sources.
I think we could use Langchain to set up a pipeline that figures out whether an input needs a RAG-based approach or can be handled directly by the LLM
Call for Contributions #43