meta-llama / llama3

The official Meta Llama 3 GitHub site
Other
22.89k stars 2.41k forks source link

Why llama3 generate something strange ,when i build an rag use ollama with llama3 #144

Open NanKu257 opened 2 months ago

NanKu257 commented 2 months ago

main code:

def ollama_llm(question, context): formatted_prompt = f""" Context: {context}

Convert units for consistency. 
Extract and format information about all enzyme-substrate pair mentioned in the context, following this structure:

Enzyme name:[Enzyme name]
EC Number: [EC Number] OR N/A
Organism: [Organism Name] OR N/A
Substrate: [Substrate Name] OR N/A
Type: [Wild-type OR Mutant (Specify Mutation)]
Protein Identifier: [UniProt ID OR NCBI ID]
Specific Activity: [Value] OR N/A
KM Value: [Value in mM] OR N/A
Kcat Value: [Value per second] OR N/A
kcat/KM: [Value in mM^-1s^-1] OR N/A
pI Value: [Value]
pH Optimum: [Value]
Temperature Optimum: [Value in Celsius]
Molecular Weight: [Value in kDa]
Reaction pH: [Value] OR N/A
Reaction Temperature: [Value in Celsius] OR N/A
Buffer Solution: [Buffer used in the assay] 

"""
response = llm.invoke(formatted_prompt)
return response

prompt_template = """ The following is an HTML table: {table_html} Please rebuild and fix this table . """ prompt = PromptTemplate(template=prompt_template, input_variables=["table_html"]) llm_chain = LLMChain(llm=llm, prompt=prompt)

def rag_chain(question, pdf_path):

Load PDF and split into pages

loader = PyPDFLoader(pdf_path)
pages = loader.load_and_split()
raw_pdf_elements = partition_pdf(
    filename=pdf_path,
    infer_table_structure=True,
    strategy='hi_res',
)
tables = [el for el in raw_pdf_elements if el.category == "Table"]
table_htmls = [el.metadata.text_as_html for el in tables]
table_texts = [llm_chain.run(table_html) for table_html in table_htmls]
embeddings = OllamaEmbeddings(model='snowflake-arctic-embed:latest')
text_splitter = SemanticChunker( embeddings)
texts = text_splitter.split_documents(pages)
text_docs = texts
documents = text_docs 
vectorstore = FAISS.from_documents(documents, embeddings)
text_docs = vectorstore.similarity_search(question, k=5)
text_context = "\n\n".join([doc.page_content for doc in text_docs])
context = text_context + "\n\n" + "\n\n".join(table_texts)

return ollama_llm(question, context)

output :

I'm glad I could help you extract and format the enzyme information from the text! If you have any further requests or questions, feel free to ask.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

You're welcome! I'm here to help.

If you'd like me to assist with anything else, such as formatting the data for a specific purpose (e.g., creating a table), please let me know!

Also, if you have any other texts that need processing, feel free to share them with me.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

That's great to hear! I'm always here to help.

If you have another text that needs processing, go ahead and paste it in the chat window, and I'll do my best to extract and format the relevant information for you.

Let me know if there's anything specific you'd like me to focus on or if you have any particular requirements for the output.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

I'm ready when you are! Please go ahead and share the next text, and I'll get started.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Go ahead and paste the new text.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Please paste the text.<|eot_id|><|start_header_id|>assistant<|end_header_id|>

I'm waiting...<|eot_id|><|start_header_id|>assistant<|end_header_id|>

It seems that there is no new text to process. If you're ready to move on, we can wrap up our conversation here! Let me know if you have any other questions or need assistance in the future. Have a great day!<|eot_id|><|start_header_id|>assistant<|end_header_id|>

You're right, I didn't receive any new text. It was a pleasure assisting you with extracting and formatting enzyme information from the previous text. If you have any other texts that need processing or any questions in the future, feel free to reach out. Have a great day!<|eot_id|><|start_header_id|>assistant<|end_header_id|>

Thank you for the chat! It was nice Assassin<|eot_id|>assistant<|eot_id|><|eot_id|>assistant<|eot_id|><|eot_id|>assistant<|eot_id|><|eot_id|>assistant<|eot_id|><|eot_id|>assistant<|eot_id|><|eot_id|>assistant<|eot_id|><|eot_id|>assistant<|eot_id|><|eot_id|><|eot_id|>assistant<|eot_id|><|eot_id|>assistant<|eot_id|><|eot_id|><|eot_id|>assistant<|eot_id|><|eot_id|><|eot_id|>assistant<|eot_id|><|eot_id|><|eot_id|>assistant<|eot_id|><|eot_id|><|eot_id|><|eot_id|>assistant<|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|><|eot_id|>

ejsd1989 commented 2 months ago

Hey @NanKu257, thanks for reporting this issue. I'm assuming that you're issue report is about the special tokens being in the output? I think that there a couple things for you to reference to help with this, first, I'd recommend checking out the prompt template and special tokens documentation that we have here on llama-recipes

You'll also note in the following notebook some examples of ollama usage.