meta-llama / llama-stack-apps

Agentic components of the Llama Stack APIs
MIT License
3.83k stars 554 forks source link

Various Questions about LLAMA3.1 and llama-stack-apps #65

Open JoseGuilherme1904 opened 1 month ago

JoseGuilherme1904 commented 1 month ago

I apologize if this is not the appropriate place for questions, concerns, or suggestions regarding the project.

One of the major challenges with AI is how quickly things progress, and understanding whether the content we're reading is still up-to-date or has already become obsolete due to the rapid pace of development.

What excited me about LLAMA3.1 is the 128K context length. I thought it would be easy to find examples where LLAMA3.1 could take a long text, interpret it, and answer questions based on it.

However, the examples provided simply don’t work or seem to be designed for an AI that doesn’t support a 128K token context.

I think the first tool we need is one that tells us how many tokens a given document contains.

Another question I have is: how exactly do these 128K context tokens work? Can I include a question along with a document containing 128K tokens? If LLAMA3.1 supports 128K tokens, why use RAG or memory_client.create_memory_bank with a chunk size of 512 and token overlap? If it allows 128K tokens of context, why is RAG with dragon-roberta-query-2 used in the provided example?

Is there any documentation for these functions? Where can I find documentation about execute_turns?

How can we configure the maximum history LLAMA can store in a question-and-answer chat? In a simple test with short questions using ollama run llama3.1, after five or six questions, it couldn’t correctly recall the first question.

When using ollama, how can we understand the maximum token window size for the context?

I believe we should create simpler examples. In a simple test with the rag_as_attachments.py script, I pointed it to a text containing a few paragraphs about the definition of God according to Allan Kardec in The Spirits' Book. Using the script, only changing the attached text and asking simple questions, it seems LLAMA3.1 simply ignores the provided text, mixes concepts, etc.

I would like to know if there is any real utility here. This is the second weekend I’ve spent trying to make it work, and the examples just don’t function as expected. There are about five or six scripts that seem to come from the LLAMA2 world. Take the inflation.py script, for instance—when exactly is the TickerDataTool called? If we use TickerDataTool, how do we set the ticker_symbol?

Another issue: what exactly are the purposes of BuiltinTool.brave_search and BuiltinTool.wolfram_alpha? What exactly do these APIs do?

As great as this all sounds, there’s a lot of explanation for installing the tool, but the six examples provided either don’t make sense, don’t work properly, or raise more questions than they answer.

If anyone can help me with real, functional examples of asking questions using more than 4096 tokens, I’d greatly appreciate it. What I really want to do is ask the AI questions about reports generated in my system. These are private data, so I can’t use ChatGPT or Gemini because the PDF data isn’t public. To this day, I’ve never seen RAG techniques genuinely help.

Thank you very much! Guilherme

JoseGuilherme1904 commented 1 month ago

Hi, this links help me: https://github.com/ollama/ollama/blob/main/docs/modelfile.md Thanks and sorry!

ashwinb commented 1 month ago

No need to apologize. Your comments and questions are very welcome!

Raf-sns commented 2 weeks ago

Lots of questions here that I also have and would like to get answers to.

JoseGuilherme1904 commented 1 week ago

Hi @Raf-sns , recently, they released a series of video lessons that clarify and teach item by item what was discussed in my post. It's definitely worth watching the lessons: https://learn.deeplearning.ai/courses/introducing-multimodal-llama-3-2/lesson/1/introduction

Raf-sns commented 1 week ago

Thank you @JoseGuilherme1904, i will go to see that. (but I'm afraid this doesn't address my problem which is to run a refined LLM on a website based on its content)