langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
95.08k stars 15.41k forks source link

DOC: <Issue related to /v0.2/docs/tutorials/rag/> #22971

Closed ZephryLiang closed 5 months ago

ZephryLiang commented 5 months ago

URL

https://python.langchain.com/v0.2/docs/tutorials/rag/

Checklist

Issue with current documentation:

mycode `

from langchain_core.prompts import PromptTemplate

template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer. Use three sentences maximum and keep the answer as concise as possible. Always say "感谢提问!" at the end of the answer,总是用中文回答问题,可以使用英语描述专业词汇.

{context}

Question: {question}

Helpful Answer:""" custom_rag_prompt = PromptTemplate.from_template(template) rag_chain = ( {"context": retriever | format_docs, "question": RunnablePassthrough()} | custom_rag_prompt | cache_mistral_model | StrOutputParser() ) while True: user_input = input("请输入问题或命令(输入 q 退出): ") if user_input.lower() == "q": break for chunk in rag_chain.stream(user_input): print(chunk, end="", flush=True)

error say :Traceback (most recent call last): File "/home/desir/PycharmProjects/pdf_parse/rag/cohere.py", line 141, in for chunk in rag_chain.stream(user_input): File "/home/desir/soft/anaconda3/envs/pdf_parse/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2873, in stream yield from self.transform(iter([input]), config, kwargs) File "/home/desir/soft/anaconda3/envs/pdf_parse/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2860, in transform yield from self._transform_stream_with_config( File "/home/desir/soft/anaconda3/envs/pdf_parse/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 1865, in _transform_stream_with_config chunk: Output = context.run(next, iterator) # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/desir/soft/anaconda3/envs/pdf_parse/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 2822, in _transform for output in final_pipeline: File "/home/desir/soft/anaconda3/envs/pdf_parse/lib/python3.11/site-packages/langchain_core/output_parsers/transform.py", line 50, in transform yield from self._transform_stream_with_config( File "/home/desir/soft/anaconda3/envs/pdf_parse/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 1829, in _transform_stream_with_config final_input: Optional[Input] = next(input_for_tracing, None) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/desir/soft/anaconda3/envs/pdf_parse/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 4057, in transform for output in self._transform_stream_with_config( File "/home/desir/soft/anaconda3/envs/pdf_parse/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 1865, in _transform_stream_with_config chunk: Output = context.run(next, iterator) # type: ignore ^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/desir/soft/anaconda3/envs/pdf_parse/lib/python3.11/site-packages/langchain_core/runnables/base.py", line 4025, in _transform output = call_func_with_variable_args( ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/desir/soft/anaconda3/envs/pdf_parse/lib/python3.11/site-packages/langchain_core/runnables/config.py", line 380, in call_func_with_variable_args return func(input, kwargs) # type: ignore[call-arg] ^^^^^^^^^^^^^^^^^^^^^ File "/home/desir/soft/anaconda3/envs/pdf_parse/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/desir/soft/anaconda3/envs/pdf_parse/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, *kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/desir/soft/anaconda3/envs/pdf_parse/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py", line 1139, in forward outputs = self.model( ^^^^^^^^^^^ File "/home/desir/soft/anaconda3/envs/pdf_parse/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1532, in _wrapped_call_impl return self._call_impl(args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/desir/soft/anaconda3/envs/pdf_parse/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1541, in _call_impl return forward_call(*args, kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/desir/soft/anaconda3/envs/pdf_parse/lib/python3.11/site-packages/transformers/models/mistral/modeling_mistral.py", line 937, in forward batch_size, seq_length = input_ids.shape ^^^^^^^^^^^^^^^ AttributeError: 'StringPromptValue' object has no attribute 'shape'`**

what happened?please

Idea or request for content:

i don't know

ashishpatel26 commented 5 months ago

It appears you're encountering an AttributeError related to the shape attribute in your code when using LangChain for question answering.

Issue

The error message indicates that an object of type StringPromptValue is being treated as if it has a shape attribute, which it does not. Specifically, the error occurs in the following lines:

batch_size, seq_length = input_ids.shape

This suggests that input_ids should be a tensor (or an array-like object) that has a shape attribute. Instead, it is a StringPromptValue.

Likely Cause

The error likely originates from your custom RAG (Retrieval-Augmented Generation) chain, where the data being passed to the model isn't in the expected format.

Solution

  1. Check Data Types in the Pipeline: Ensure that all parts of the pipeline are correctly transforming the inputs and outputs, particularly before feeding data to the model.
  2. Modify the Custom RAG Prompt: Ensure proper transformation of the input_ids to a format compatible with your model. Add an extra step to transform the StringPromptValue into an appropriate tensor or array.

Implementation

Here's a potential fix:

from langchain_core.prompts import PromptTemplate
from langchain_core.runnables.base import RunnablePassthrough
from langchain_core.runnables.transform import StrOutputParser
from transformers.models.mistral.modeling_mistral import MistralModel
import torch

template = """
Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "感谢提问!" at the end of the answer,总是用中文回答问题,可以使用英语描述专业词汇.
{context}
Question: {question}
Helpful Answer:"""

custom_rag_prompt = PromptTemplate.from_template(template)
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | cache_mistral_model
    | StrOutputParser()
)

def transform_to_tensor(input_text):
    # Dummy tokenizer and tensor conversion, use actual tokenizer accordingly
    tokenizer = MistralModel.from_pretrained("eleutherai/mistral")
    tokens = tokenizer.encode(input_text, return_tensors="pt")
    return tokens

while True:
    user_input = input("请输入问题或命令(输入 q 退出): ")
    if user_input.lower() == "q":
        break
    # Transform user input to correct type for model processing
    transformed_input = transform_to_tensor(user_input)
    for chunk in rag_chain.stream(transformed_input):
        print(chunk, end="", flush=True)

Explanation

Notes

Additional Resources

Analysis

Carefully check and adjust the data types and transformations in the pipeline to resolve the AttributeError. This example provides a framework for addressing the issue by transforming inputs to the correct format expected by the model.

ZephryLiang commented 5 months ago

It appears you're encountering an AttributeError related to the shape attribute in your code when using LangChain for question answering.

Issue

The error message indicates that an object of type StringPromptValue is being treated as if it has a shape attribute, which it does not. Specifically, the error occurs in the following lines:

batch_size, seq_length = input_ids.shape

This suggests that input_ids should be a tensor (or an array-like object) that has a shape attribute. Instead, it is a StringPromptValue.

Likely Cause

The error likely originates from your custom RAG (Retrieval-Augmented Generation) chain, where the data being passed to the model isn't in the expected format.

Solution

  1. Check Data Types in the Pipeline: Ensure that all parts of the pipeline are correctly transforming the inputs and outputs, particularly before feeding data to the model.
  2. Modify the Custom RAG Prompt: Ensure proper transformation of the input_ids to a format compatible with your model. Add an extra step to transform the StringPromptValue into an appropriate tensor or array.

Implementation

Here's a potential fix:

from langchain_core.prompts import PromptTemplate
from langchain_core.runnables.base import RunnablePassthrough
from langchain_core.runnables.transform import StrOutputParser
from transformers.models.mistral.modeling_mistral import MistralModel
import torch

template = """
Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "感谢提问!" at the end of the answer,总是用中文回答问题,可以使用英语描述专业词汇.
{context}
Question: {question}
Helpful Answer:"""

custom_rag_prompt = PromptTemplate.from_template(template)
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | cache_mistral_model
    | StrOutputParser()
)

def transform_to_tensor(input_text):
    # Dummy tokenizer and tensor conversion, use actual tokenizer accordingly
    tokenizer = MistralModel.from_pretrained("eleutherai/mistral")
    tokens = tokenizer.encode(input_text, return_tensors="pt")
    return tokens

while True:
    user_input = input("请输入问题或命令(输入 q 退出): ")
    if user_input.lower() == "q":
        break
    # Transform user input to correct type for model processing
    transformed_input = transform_to_tensor(user_input)
    for chunk in rag_chain.stream(transformed_input):
        print(chunk, end="", flush=True)

Explanation

  • transform_to_tensor Function: Transforms user_input into a tensor that the model can work with.
  • Tokenizer: Demonstrates dummy tokenizer usage. Replace it with your model's actual tokenizer.

Notes

  • Ensure all dependencies are correctly installed.
  • Adapt the transformation function to your specific model/tokenizer if needed.
  • This is a basic example to get you started; you might need further adjustments based on your specific pipeline configuration.

Additional Resources

  • LangChain Documentation: Refer to the official documentation for detailed guidance.

Analysis

Carefully check and adjust the data types and transformations in the pipeline to resolve the AttributeError. This example provides a framework for addressing the issue by transforming inputs to the correct format expected by the model.

ile "/home/desir/PycharmProjects/pdf_parse/rag/cohere.py", line 138, in transform_to_tensor tokenizer = MistralModel.from_pretrained("eleutherai/mistral") ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/desir/soft/anaconda3/envs/pdf_parse/lib/python3.11/site-packages/transformers/modeling_utils.py", line 3051, in from_pretrained resolved_config_file = cached_file( ^^^^^^^^^^^^ File "/home/desir/soft/anaconda3/envs/pdf_parse/lib/python3.11/site-packages/transformers/utils/hub.py", line 422, in cached_file raise EnvironmentError( OSError: eleutherai/mistral is not a local folder and is not a valid model identifier listed on 'https://huggingface.co/models' If this is a private repository, make sure to pass a token having permission to this repo either by logging in with huggingface-cli login or by passing token=<your_token>

ZephryLiang commented 5 months ago

It appears you're encountering an AttributeError related to the shape attribute in your code when using LangChain for question answering.

Issue

The error message indicates that an object of type StringPromptValue is being treated as if it has a shape attribute, which it does not. Specifically, the error occurs in the following lines:

batch_size, seq_length = input_ids.shape

This suggests that input_ids should be a tensor (or an array-like object) that has a shape attribute. Instead, it is a StringPromptValue.

Likely Cause

The error likely originates from your custom RAG (Retrieval-Augmented Generation) chain, where the data being passed to the model isn't in the expected format.

Solution

  1. Check Data Types in the Pipeline: Ensure that all parts of the pipeline are correctly transforming the inputs and outputs, particularly before feeding data to the model.
  2. Modify the Custom RAG Prompt: Ensure proper transformation of the input_ids to a format compatible with your model. Add an extra step to transform the StringPromptValue into an appropriate tensor or array.

Implementation

Here's a potential fix:

from langchain_core.prompts import PromptTemplate
from langchain_core.runnables.base import RunnablePassthrough
from langchain_core.runnables.transform import StrOutputParser
from transformers.models.mistral.modeling_mistral import MistralModel
import torch

template = """
Use the following pieces of context to answer the question at the end.
If you don't know the answer, just say that you don't know, don't try to make up an answer.
Use three sentences maximum and keep the answer as concise as possible.
Always say "感谢提问!" at the end of the answer,总是用中文回答问题,可以使用英语描述专业词汇.
{context}
Question: {question}
Helpful Answer:"""

custom_rag_prompt = PromptTemplate.from_template(template)
rag_chain = (
    {"context": retriever | format_docs, "question": RunnablePassthrough()}
    | custom_rag_prompt
    | cache_mistral_model
    | StrOutputParser()
)

def transform_to_tensor(input_text):
    # Dummy tokenizer and tensor conversion, use actual tokenizer accordingly
    tokenizer = MistralModel.from_pretrained("eleutherai/mistral")
    tokens = tokenizer.encode(input_text, return_tensors="pt")
    return tokens

while True:
    user_input = input("请输入问题或命令(输入 q 退出): ")
    if user_input.lower() == "q":
        break
    # Transform user input to correct type for model processing
    transformed_input = transform_to_tensor(user_input)
    for chunk in rag_chain.stream(transformed_input):
        print(chunk, end="", flush=True)

Explanation

  • transform_to_tensor Function: Transforms user_input into a tensor that the model can work with.
  • Tokenizer: Demonstrates dummy tokenizer usage. Replace it with your model's actual tokenizer.

Notes

  • Ensure all dependencies are correctly installed.
  • Adapt the transformation function to your specific model/tokenizer if needed.
  • This is a basic example to get you started; you might need further adjustments based on your specific pipeline configuration.

Additional Resources

  • LangChain Documentation: Refer to the official documentation for detailed guidance.

Analysis

Carefully check and adjust the data types and transformations in the pipeline to resolve the AttributeError. This example provides a framework for addressing the issue by transforming inputs to the correct format expected by the model.

i try : cache_dir = os.path.expanduser("~/.mistral") cache_mistral_tokenizer = AutoTokenizer.from_pretrained(pretrained_model_name_or_path=cache_dir) def transform_to_tensor(input_text):

Dummy tokenizer and tensor conversion, use actual tokenizer accordingly

# tokenizer = MistralModel.from_pretrained("eleutherai/mistral")
tokenizer = cache_mistral_tokenizer
tokens = tokenizer.encode(input_text, return_tensors="pt")
return tokens

error say:请输入问题或命令(输入 q 退出): 文章主要讲了什么 'Tensor' object has no attribute 'replace' it seems like transforming text into Tensor is not good choice