langchain-ai / langchain

🦜🔗 Build context-aware reasoning applications
https://python.langchain.com
MIT License
91.08k stars 14.48k forks source link

Issue: How to use other natural languages besides English? #13250

Closed f3rn4nd0-c354r closed 6 months ago

f3rn4nd0-c354r commented 9 months ago

Issue you'd like to raise.

Hello, Thanks for this framework. It's making everyone's work simpler!

I'm using an LLM model to infer data from Portuguese websites, and expecting answers in Portuguese. But some of Langchain features, namely the "Get format instructions" for Output Parses come written in English. I'm not sure what the best approach would be here.

Should I do all my prompting in English, and just add "Answer in Portuguese" at the end?

Should this be a feature in Langchain, and if so, how would it work?

I'm not sure asking the community to translate it would be the right approach, because this is not just about translation, but making sure the prompts work correctly. I would be fine with some api call to replace the English text with my translation, but that doesn't seem to be part of the public API at the moment.

Thanks,

Suggestion:

No response

dosubot[bot] commented 9 months ago

🤖

Hello @f3rn4nd0-c354r,

I'm an AI bot here to assist you while we wait for a human maintainer to join us. I'm here to help you resolve bugs, answer questions, and guide you on becoming a contributor. I'm currently looking into the issue you've raised and will provide you with a detailed answer shortly.

dosubot[bot] commented 9 months ago

🤖

Thank you for your question and for using LangChain.

Based on the similar issues I found in the LangChain repository, there are a couple of approaches you could take to use LangChain for inferring data from Portuguese websites and getting responses in Portuguese.

One approach is to tweak the tool and agent prompt to ensure the input and output languages are consistent. Here is a code snippet from a similar issue that was solved:

# For Tool have a pre-defined format
class KnowledgeTool(BaseTool):
    request_format = '{{"USER": "<input_question>"}}'
    name = 'Knowledge Tool'
    description = f"""
    Tool to answer something knowledgeable. Input should be JSON in the following format: {request_format}
        """
    return_direct = False

    # define your tool methods etc.....

# Initialize agent 
conversational_agent = initialize_agent(
        agent='conversational-react-description',
        tools=[KnowledgeTool()])

# You can identify the language with any modules like lingua or fasttext       
prompt_prefix = f"""<Your description similar to system message> Use only {language} language to reply"""

# Now the magic part 
conversational_agent.agent.llm_chain.prompt = prompt_prefix

Another approach is to create a custom prompt and specify the desired language, as suggested in this issue.

As for the "Get format instructions" for Output Parses, currently, the method get_format_instructions in the StructuredOutputParser class generates the instructions in English. You could potentially translate these instructions into Portuguese using a separate translation process or service. Here is the relevant code:

def get_format_instructions(self, only_json: bool = False) -> str:
    schema_str = "\n".join(
        [_get_sub_string(schema) for schema in self.response_schemas]
    )
    if only_json:
        return STRUCTURED_FORMAT_SIMPLE_INSTRUCTIONS.format(format=schema_str)
    else:
        return STRUCTURED_FORMAT_INSTRUCTIONS.format(format=schema_str)

I hope this helps! If you have any further questions or need more clarification, please let me know.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

f3rn4nd0-c354r commented 9 months ago

Thanks Dosubot.

I was looking for some guidance. Is anyone working with other natural languages besides English? What do you recomend? Prompting in English and adding a request to have an answer in other language?

Is LangChain planning to be international friendly, by providing easy methods to provide translated prompts? I can help with this, if that's the way.

dosubot[bot] commented 9 months ago

🤖

Based on the information I found in the LangChain repository, it seems that there is a way to handle prompts in languages other than English. A similar issue titled Multilingual prompt templates suggests that it may be possible to create a custom prompt and specify the desired language. However, the specifics of how to implement this are not detailed in the issue.

Regarding the "Get format instructions" feature for Output Parses, it appears that the instructions are currently generated in English. The get_format_instructions method in the DatetimeOutputParser class, for example, returns a string that provides instructions for the datetime format in English. Here is the relevant code:

def get_format_instructions(self) -> str:
    examples = comma_list(_generate_random_datetime_strings(self.format))
    return f"""Write a datetime string that matches the 
        following pattern: "{self.format}". Examples: {examples}"""

This suggests that you might need to translate these instructions into Portuguese manually if you want to use them in your application.

As for your question about whether this feature is being considered for future development, I couldn't find any information about this in the repository. You might want to consider opening a new issue to suggest this feature and express your interest in contributing to it.

Finally, regarding your question about whether anyone else is working with natural languages other than English, I found an issue titled How to use other natural languages besides English? which suggests that other users are indeed interested in this. However, it doesn't provide any specific recommendations on how to approach this.

I hope this information is helpful. If you have any further questions or need more clarification, please let me know.

Sources


This response is meant to be useful and save you time. It is not meant to be a precise solution, but rather a starting point for your own research.

Help me be more useful! Please leave a 👍 if this is helpful and 👎 if it is irrelevant.

If you want to continue the conversation, start your reply with @dosu-bot.

dosubot[bot] commented 6 months ago

Hi, @f3rn4nd0-c354r

I'm helping the LangChain team manage their backlog and am marking this issue as stale. From what I understand, you raised this issue regarding the use of natural languages other than English with the Langchain framework, specifically concerned about handling prompts and answers in Portuguese. Dosubot provided guidance on potential approaches, including creating custom prompts and translating instructions into Portuguese. You expressed interest in contributing to making Langchain international-friendly, and Dosubot mentioned that other users have shown interest in working with languages other than English.

Could you please confirm if this issue is still relevant to the latest version of the LangChain repository? If it is, please let the LangChain team know by commenting on the issue. Otherwise, feel free to close the issue yourself, or the issue will be automatically closed in 7 days.

Thank you for your understanding and contribution to LangChain! If you have any further questions or updates, feel free to reach out.

f3rn4nd0-c354r commented 6 months ago

Every day more relevant.

davivc commented 1 month ago

Hello, @f3rn4nd0-c354r. I don't know if you already solve your issue but I've been using some langchain tools and until now I didn't find any problem mixing Portuguese with English instructions. I think this case can be solved with prompt engineering. I gave explicit instructions in the system message to answer only in Portuguese and that the assistant don't know other languages. Also for the human response I reinforce that the assistant must answer in Portuguese. As of today, I've tested with OpenAI 4, Gemini 1.5 Flash and Mistral 7x13B and it worked well with these 3. I'm using tool calling with create_json_chat_agent and AgentExecutor.

I'm still improving our company's implementation of this agent, so if I find myself struggling with this problem I'll come back here to let you know.