huggingface / huggingface-llama-recipes

531 stars 59 forks source link

LLMs and RAG Pipeline #47

Closed Solobrad closed 1 month ago

Solobrad commented 1 month ago

Modern LLMs like Llama seem to outperform traditional RAG methods on long-context tasks, demonstrating improved context handling and understanding, which may lead to reconsidering the need for RAG in many scenarios.

However, I recently came across something called SRF-RAG, which offers several key benefits.

Retrieval: Retrieves relevant context from external sources. Generation: Produces coherent responses based on retrieved context. Instruction Tuning: Improves understanding of complex queries. Hallucination Reduction: Minimizes incorrect or misleading information. Multi-Hop Reasoning: Handles complex questions by synthesising information from multiple sources.

I think we could use Langchain to set up a pipeline that figures out whether an input needs a RAG-based approach or can be handled directly by the LLM

Call for Contributions #43

ariG23498 commented 1 month ago

Hey @Solobrad

Thanks for taking this bit up!

For the time being would you be interested in building a quick RAG pipeline with the Llama family of models? Once that is done, we could look into SRF-RAG as an enhancement.

The suggestion is based on the fact that this repository (huggingface-llama-recipes) is built with the idea of helping anyone to get started quickly.

Please let me know how you feel about my suggestion. Also feel free to ask any questions if you have.

Solobrad commented 1 month ago

Hi @ariG23498

I'm in to collaborate with other teams on this repo. Thanks for the opportunity!

ariG23498 commented 1 month ago

That would be great!

But would you be open to implementing a very simple (here simple is the keyword) RAG pipeline in the first place?

If you are fine with that, I can redirect other contributors to this issue so that you can collaborate with them on this.

Solobrad commented 1 month ago

Yup count me in

Purity-E commented 1 month ago

Hi @ariG23498 , Thanks for redirecting me to this issue. @Solobrad I'm looking forward to collaborating with you.

Solobrad commented 1 month ago

Same here, nice meeting you @Purity-E !

Solobrad commented 1 month ago

@Purity-E, we are required to start simple so I've added a very basic pipeline code. If the PR is accepted, we can work on enhancements later

Solobrad commented 1 month ago

@ariG23498 I see a few issues same as mine, should we bring them over here? Brainstorming on the enhancements.

Purity-E commented 1 month ago

@Purity-E, we are required to start simple so I've added a very basic pipeline code. If the PR is accepted, we can work on enhancements later

Cool. Thanks for the update.

ariG23498 commented 1 month ago

@ariG23498 I see a few issues same as mine, should we bring them over here? Brainstorming on the enhancements.

Feel free to. Having said that, we are not really looking for a very complicated project with RAG. It should be enough to get anyone started with RAG using Llama.

atharv-jiwane commented 1 month ago

Hey @ariG23498 thanks for redirecting me to here. Hii @Solobrad looking forward to working with you!

Solobrad commented 1 month ago

Hi @atharv-jiwane welcome to the team! I like your idea though, image retrieval can be more effective, especially with PDFs.

atharv-jiwane commented 1 month ago

Hey! This is my first time contributing to an open source project so I am really excited! I saw the PR that was created pertaining to this issue and wanted to discuss the how we are going to build from the initial commit. Also saw @ariG23498 's comments on the PR and wanted to take that up. Let me know how you wanna divide/distribute the work.

Solobrad commented 1 month ago

Sure man, which would you like to work on? I'm thinking of adding you both to my forked repo as collaborators, so we can discuss work delegations and work on it. Let me know if that works for you. @Purity-E @atharv-jiwane

Purity-E commented 1 month ago

@Solobrad sure that's okay

atharv-jiwane commented 1 month ago

@Solobrad Yup sounds good! I could take up the embedding part

Solobrad commented 1 month ago

Cool, we'll be working on the "llama-rag" branch then.

Solobrad commented 1 month ago

I'll check on the dataset.

Solobrad commented 1 month ago

I've added a transcript dataset @atharv-jiwane, it's clean and pretty straightforward. You can try embedding it. Thanks

atharv-jiwane commented 1 month ago

I have tried embedding the dataset. I am not sure I committed the changes properly, @Solobrad could you please guide me

Solobrad commented 1 month ago

Hey @atharv-jiwane , I saw an error about the LLaMa access try filling in the access form https://huggingface.co/meta-llama/Llama-3.1-8B. Even though LLaMa is an open-sourced LLM, we normally have to fill in an application before we can use it from Hugging Face or Kaggle. Does this answer your question?

I changed the code a little because the naming you used for sentenceTransfromer was rewriting the previous LLaMa model. Go ahead and check it out. Hope this helps.

atharv-jiwane commented 1 month ago

Hey @Solobrad , I've reviewed your changes and I have filled the verification form for using LLaMa. Thank you for the information on that. I think it takes some time to get the request reviewed.

Meanwhile, I think the only changes I have made are in the embeddings sections right after the dataset has been imported so could you please commit an error-free version of the code to the "llama-rag" branch?

atharv-jiwane commented 1 month ago

Hey @Solobrad , I have added an LLM pipeline in the latest commit and fixed the earlier auth issues with LLaMa models. I tried to run the query but it took too long to generate a response. Could you please guide me as to where I am going wrong?

Also, the earlier version of embeddings that I wrote could instead just be done when we create the vector store right?

Solobrad commented 1 month ago

Yup @atharv-jiwane , just create the vector store and let it handle embedding the documents. You don't need to separately encode them. If this was what you were asking.

I'll try checking on the prolonged response time.

atharv-jiwane commented 1 month ago

Cool, so @Solobrad let's do away with the separate encodings? Also can we add GPU support? I am running this locally on a Macbook Air M2 so I think GPU support would be nice.

Also pertaining to the response time, when I first passed the query ("What is Hugging Face.") to the LLM there was an error code generated which said that the max_new_tokens was exceeded beyond 20. This might also be causing an issue.

Solobrad commented 1 month ago

Hi, I solved the max token problem, and I attribute the ‘long’ response time to using the model locally. I’ve tried to use APIs directly and also a smaller model.

@Purity-E @atharv-jiwane, I've pushed the latest runnable code, go on and have a try.

atharv-jiwane commented 1 month ago

@Solobrad Thanks for the update! I’ll have a go soon, running slightly busy

Solobrad commented 1 month ago

@sinatayebati will be joining us.

Solobrad commented 1 month ago

Hey everyone, I suggest adding some compelling markdown so users can easily read what's going on (as mentioned before). So it's like a simple DEMO or tutorial, what's your take on this? @sinatayebati @atharv-jiwane @Purity-E

PS: I added the LLaMa back

sinatayebati commented 1 month ago

@Solobrad Hey Nicholas. Thanks for the latest commits. In my opinion this latest notebook should be very close to what HF team has in mind. I also just pushed two minor updates:

Solobrad commented 1 month ago

Awesome, thanks!

atharv-jiwane commented 1 month ago

Hey @Solobrad! I think the latest commit looks good. I think we should consult the maintainers and ask for their opinion on this

Solobrad commented 1 month ago

I've updated the code according to the latest requirements guys @Purity-E @atharv-jiwane. Feel free to add any markdowns or so. You should use Google Collab if you want to run the code.

ariG23498 commented 1 month ago

Closing this issue as the PR has been merged! Thanks for the great contribution.